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Human history defies easy stories 


The discovery of part of a55,000-year-old human skull in Israel will help to answer some 
questions about our species’ evolution — but it shows that the tale is complicated. 


hen modern humans spread out of Africa and across 
Wiens 60,000-40,000 years ago, they replaced all other 

members of the human family, and laid the foundations for 
the modern world. Who were these ancestors? They left few fossils, 
and fewer answers. 

A new piece of the puzzle is reported this week on Nature’s website: 
a partial skull found in Manot Cave in northern Israel that has been 
dated to around 55,000 years ago (I. Hershkovitz et al. Nature http:// 
dx.doi.org/10.1038/nature14134 (2015) and page 541). The skull, 
which has a distinctive ‘bun’-shaped occipital bone (the lower-back 
region), resembles those of modern humans found in Europe, dating 
to the Upper Palaeolithic starting around 50,000 years ago. 

Where does the find fit in? Beware simple answers, and, indeed, sim- 
ple questions. There is a temptation when discussing human evolution 
to reconstruct it as a narrative, in which successive species evolved to be 
more like us, and the more like us they became, the more likely they were 
to migrate to other parts of the world and replace pre-existing forms. 

There are at least four things wrong with this. The first is its rather 
imperialist framing, in which evolution and replacement can be justi- 
fied after the fact as a kind of manifest destiny. 

The second is that it dismisses any extinct species as inferior and 
therefore of secondary importance. 

The third is that it assumes the existence of an arrow of progress, in 
which species always evolve towards ourselves, a mistaken view that is 
too welcoming of spurious conceits such as ‘missing links, and unwill- 
ing to countenance odd side branches such as Homo floresiensis, the 
peculiar, dwarf hominin (member of the human family) that lived in 
Indonesia until relatively recent times (see nature.com/hobbit10). 

The fourth, and arguably the most important, is that it misrepre- 
sents the extreme fragmentation of the fossil record, something that 
Charles Darwin recognized, with his usual percipience, as a ‘diffi- 
culty’ with his theory of evolution by natural selection. Darwin was 
(as usual) selling himself short. That evolution has happened is no 
longer in doubt: the shared chemistry and structure of all life, from 
the meanest microbe to the furriest feline, would be testament to that, 
even had no fossils ever been found. 

Fossils offer more than concrete proof that evolution happened. 
They reveal a wealth of organic forms that no longer exist. The only 
species of hominin alive today, as far as we know, is our own, Homo 
sapiens. But this sole estate hides a large number of extinct forms, each 
of which contributed to Earth’s ecology in its own particular way. If 
the present epoch is unusual, it is in the presence of just one species 
of hominin. A mere 50,000 years ago, there were at least four different 
species. There are very likely to have been more. 

Homo sapiens first appears in the fossil record around 200,000 years 
ago in Ethiopia, albeit in a distinctly archaic form. The earliest fossil 
is not the same as the earliest member of a species — H. sapiens is 
probably much older than this. Archaic forms of our species outside 


Africa first appear around 90,000 years ago, in the Levant. Another 
45,000 years or so were to pass before our species made it to southeast- 
ern Europe, where it appeared amid a spectacular flourish of technol- 
ogy and what we would instantly recognize as art. 

What happened between 90,000 and 45,000 years ago, a period 
ten times the length of recorded history? Only the fossils can tell us, 
and they are few. It seems that the earliest modern humans got to 

the Levant and no farther. Mount Carmel in 


“What happened Israel hosts caves, such as Qafzeh and Skhul, 
between 90,000 — where H. sapiens remains appear in levels 
and 45,000 older than those occupied by Neanderthals, 
yearsago? Only — Homo neanderthalensis. The replacement of 
the fossils can our own species by Neanderthals seems to 
tellsus,andthey be anaffront to our prejudices. So how did 
are few.” humans eventually make it to Europe? 


The partial H. sapiens skull from Manot 
Cave goes some way towards providing an answer, as well as hinting at 
how complicated our early history might have been. It looks much more 
modern than skulls from Qafzeh and Skhul. It is also much younger, 
suggesting that the hominin was closer, genetically and evolutionarily, 
to the earliest known European representatives of our species. This skull, 
the simple answer would suggest, represents modern humans poised to 
expand out of Africa and colonize the rest of the world. 

Here comes the ‘but. Our modern genomes contain Neanderthal 
DNA. At some point, our ancestors bred with Neanderthals before 
they became extinct. Does the Manot skull represent that moment? 
We simply do not know. Welcome, Manot skull, to messy reality. m 


Senate vs science 


A few Republicans agreeing with basic climate 
research is not an environmental victory. 


critics in the annual State of the Union address to Congress on 

20 January, arguing that they cannot shy away from modern 

climate science. A day later, pushed to take a position, 15 Republicans 

voted in favour of an amendment affirming the idea that humans 

have a role in climate change. Five went a step further, voting for a 

Democratic amendment stating that human activity “significantly 
contributes to climate change”. And this is progress? 

Although both amendments attracted a majority of the US Senate, 

neither achieved the 60-vote threshold required for approval. These 

votes are of course purely symbolic, but political types are already busy 


U S President Barack Obama challenged his conservative climate 
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reading the tea leaves for the 2016 presidential elections. For some, the 
fact that any Republicans, however few, felt compelled to endorse basic 
climate science is a positive sign that the party is once again worried 
about how the issue of climate change will play with US voters. We can 
only hope that it will at last get the attention it deserves in a major US 
election, but it is hard to get too excited. 

The five Republicans who voted in favour of the Democratic amend- 
ment that made the strongest connection between human activity and 
climate change deserve credit for doing so. But the flip side is that 
49 out of the 54 Republicans in the Senate voted against an amendment 
that merely states mainstream scientific theory, as vetted by countless 
researchers, studies and assessments over the course of more than a 
quarter of a century. And 39 refused to agree to a statement that linked 
human activity and climate change in any way. Moreover, it is not clear 
that any of the Republicans, or indeed many of the Democrats, are 
prepared to actually do anything significant about it. 

The upshot is that little has changed. Obama has started to bypass 
Congress to push forward with his own climate regulations wherever 
possible, and he is right to do so (see page 535). If there is any criticism 
to be laid at his feet, it is not that he has been too ambitious with his 
regulatory powers, as suggested by Republicans, but that he has not 
been ambitious enough. His administration could certainly be more 
aggressive with its planned rules for power-plant emissions, as well as 
with methane regulations it is developing for the oil and gas sector. These 
regulations will help to determine whether the United States can capital- 
ize on the shift from coal to natural gas and renewables, such as wind 
and solar, that has helped to reduce the nation’s emissions in recent years. 

For their part, Republicans have focused their energy on the Keystone 
XL oil pipeline from the Canadian tar sands to the US Gulf Coast, with 
leadership in both houses of Congress putting legislation approving it at 


the top of their agenda. Environmentalists have done the same, arguing 
that Keystone represents a step in the wrong direction that will merely 
drive up greenhouse-gas emissions by promoting the development ofa 
dirty energy source. The reality is that the pipeline, on its own, would not 
have a significant impact on either the US economy or the global climate. 

It will be up to Obama to decide whether the pipeline is in the national 
interest, once the state department finishes its review of the project. 
The president has said that the pipeline will 


“Obama has benefit Canadian oil producers rather than US 
startedtobypass consumers, given that petrol prices — already 
Congress topush lower than they have been in a long time — 
fe orward with are driven by the international oil market. He 
his own climate has also said that he will approve the project 
regulations, and _ onlyifit does not “significantly exacerbate” the 
heis right to do problem of carbon pollution. 

so.” In the end, Obama has plenty of wiggle 


room in terms of how he defines both 
‘national interest’ and ‘significant exacerbation. There are surely better 
places to invest from a public perspective, but there are also better 
ways to guide private investments, including oil pipelines. One of them 
is to enact comprehensive climate legislation that clarifies the cost 
of carbon and the basic economics for all energy and infrastructure 
investments. That he has not done this is Obama’s biggest failure on 
the environmental front. 

Allis not lost. If the United States can continue to reduce its own 
emissions and help to secure meaningful action abroad, then histo- 
rians may yet look back at Obama's presidency as a turning point in 
the battle against global warming. One thing, however, seems clear 
enough: the president’s environmental legacy will not be determined 
by his decision on the Keystone XL pipeline. m 


Technical support 


Technicians are often under appreciated, but 
without them there could be no research. 


which to judge the temperament of the author is to scruti- 

nize the acknowledgements. Usually raw and unedited, the 
way these few pages of thanks are presented — gushing, self-centred 
or brief — can often say as much about the writer as the preceding 
300 pages. The same is true for the process of science. Beneath the 
polished exterior of published academic papers and university press 
releases lies another world. And it is a world that can be glimpsed, 
more often than not, in the brief acknowledgements of a PhD thesis. 

Alongside the praise (through gritted teeth?) for a (largely absent?) 
academic supervisor and the earnest gratitude showered on parents, 
spouses and pets for pastoral support, there is usually a list of thanks for 
Angela, Juan, Denise, Samuel, Ernie and a directory of other essential 
first-named extras. This cast of thousands is made up of the support 
staff and lab technicians who work behind the scenes to hold up the 
entire research enterprise, and who rarely get the attention they deserve. 

On page 542, Nature makes a small effort to address this common 
oversight. A News Feature places a handful of these support staff front 
and centre, and offers details on not just their surnames, but also their 
crucial role. They might have more eye-catching job descriptions than 
many of their colleagues. But they represent an army of essential workers 
who are just as valuable and just as deserving of thanks. 

The featured four all have very different occupations. Sarah Davis 
creates laboratory glassware; Jim Harrison collects venom from 
deadly snakes; Bill Klimm sifts the seas for squid and other inhabit- 
ants of the deep; and Dawn Johnson keeps the digital wheels turning 


A n old trick for book reviewers who have little material with 
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in a global bioinformatics archive. What they have in common is 
their close ties with the researchers they assist, and their remarkable 
and specialized skills. 

Given that technical and support staff are such an important pillar of 
academic life, it is perhaps surprising that so little academic attention 
has been paid to their lot — and whether they are content with it. In 
2011, researchers at King’s College London did publish a rare survey 
of skills and training in the United Kingdom, which raised a series of 
red flags (see go.nature.com/n74jsb). Technical staff are exposed on 
the front line when funding cuts bite: numbers working in university 
departments had decreased across the disciplines, both in absolute 
terms and relative to the number of academics and students whom 
they are expected to support. 

One academic said: “We're skating on thin ice — if people are away 
ill, or on a conference, or on training... it’s a nightmare. If the aca- 
demic department is an engine, then technicians are the engine oil that 
keeps the department running smoothly. Low technician numbers 
now mean that the department is in danger of seizing up” 

University managers should take note: the report warned that 
the increasing trend for centralizing services and technical support 
could weaken the bond between academics and technicians, and so 
threaten research. For example, shared mechanical workshops, formed 
by consolidating the facilities of several departments to save money, 
are unpopular and demoralizing. “University managers sometimes 
seem not to appreciate the vital contribution that workshop techni- 
cians make to research,’ the report said. “It is important to highlight 
the scope for centralisation to generate problems.” 

We know that PhD students appreciate the efforts of support staff, 
but do more senior scientists? Almost certainly. But do the technicians 
know that? Tell them! Do it today. Print out this 
editorial and pin it up in break rooms and on 
staff notice boards. Let technicians everywhere 
read the following: Angela, Juan, Denise, Samuel, 
Ernie — and all the rest — we salute you. = 
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UNIV. COLORADO 


WORLD VIEW  jennisicms sen 


Sporting bodies say that it is a very small minority. But a 

documentary broadcast in Germany last month suggested a 
much higher figure. Several Russian athletes claimed that nearly all of 
their colleagues dope, and with the knowledge of officials. The World 
Anti-Doping Agency (WADA) immediately launched an investiga- 
tion, which is expected to report this year. 

Science helps to keep sport clean by developing tests to screen athletes 
for banned substances. Bodies such as WADA and the US Anti-Doping 
Agency (USADA) say that they are doing all they can to deter doping. 
But they have so far neglected to carry out a simple scientific analysis of 
how widespread the problem is. Or if they have, they have not published 
the results. This makes it impossible for the rest of us to assess whether 
anti-doping policies are working. 

Drug testing in sport, as currently imple- 
mented, might catch the occasional cheat and 
could deter others, but these results do little to 
help design an anti-doping strategy, and to inde- 
pendently assess whether it works. For that, we 
need to know whether the number of athletes 
doping is going up or down. And to do that, we 
need a reliable measure of what proportion of 
athletes dope. The problem — and the best way 
to manage it — is very different if 1% of athletes 
dope than if 50% of them do. 

Although the stated goal of anti-doping 
agencies is to prevent prohibited drug use, 
they simply do not gather the data to enable 
evaluation of how effective their policies are. This is despite sporting 
bodies across the world spending an estimated US$350 million on 
drug testing each year. 

Estimating the number of elite athletes who dope is straightforward, 
and perfectly suited to the tools of science. Determining this number 
is much easier than other efforts by scientists to quantify unknowns, 
such as estimates of the number of planets in the Galaxy or whales 
in the sea. In probability-speak, it is a ball and urn problem: how do 
we determine how many black balls there are in an urn that contains 
1,000 white and black balls if we can sample only a small number? 

To assess the prevalence of sports doping, such an analysis needs two 
things: a reliable estimate of the total population of elite athletes and a 
proper randomized testing protocol. The first is readily available. For 
instance, at the London 2012 Summer Olympics, nearly 11,000 athletes 
participated from more than 200 countries. Each country conducted 
Olympic trials with its own pool of registered, domestic competitors 
seeking to qualify for the games. For the second 


| ] ow many elite athletes take performance enhancing drugs? 


requirement, because screening every athleteover NATURE.COM 
ayear is impractical, anti-doping agencies could _ Discuss this article 
carry out randomized tests designed to support _ online at: 


estimates of the prevalence of doping alongside —_go.nature.com/w2czqd 


ANTI-DOPING 


AGENCIES 


SUFFER 


FROM A SORT OF 
INSTITUTIONALIZED 


BLINDNESS. 


Gather data to reveal true 
extent of doping 1n sport 


Drug cheats will not be tackled properly until anti-doping agencies do more to 
assess the scale of the problem scientifically, says Roger Pielke Jr. 


existing testing programmes at a marginal cost. 

Current doping tests are anything but random, at least in a statistical 
sense. Some athletes are tested several times, others not at all. In 2013, 
USADA says that it conducted 9,197 tests on 4,640 athletes. Decisions 
on which athletes were subjected to these tests were determined ‘strate- 
gically’ it says. The number of positive tests, then, cannot be used to say 
anything about a broader population. The same is true for existing global 
statistics. WADA says that it tested 176,502 samples (not individual ath- 
letes) in 2013, and that 1% gave ‘adverse analytical findings’ (AAFs). 

But such a red flag does not necessarily mean that doping has 
occurred, because some athletes have exemptions for prohibited sub- 
stances, for instance. Nor do the data allow for the matching of AAFs 
to sanctions against athletes. So of the 176,502 samples, what does a 1% 
AAF actually mean? It is impossible to say, and 
that is the problem. 

Why has there been no effort to quantify the 
problem of doping in sport (or if it has been done, 
why is it not published)? Evidence suggests that 
the leaderships of these organizations do not want 
to know the true extent of doping or their effec- 
tiveness in regulating it. In 2012, Richard Pound, 
the first president of WADA, oversaw an agency 
committee called Lack of Effectiveness of Testing 
Programs. The committee’s report concluded that 
within the sports community, “there is no general 
appetite to undertake the effort and expense of a 
successful effort to deliver doping-free sport”. 

WADA, created after a major drug scandal in 
cycling in the late 1990s, is unique in that it is overseen by governments 
in partnership with non-governmental sports organizations, and oper- 
ates under the provisions of a United Nations treaty. In principle, this 
signifies a public responsibility and expectations of accountability to 
stated goals. The UN treaty gives these agencies legitimacy, and thus no 
excuse not to be transparent. 

In my opinion, anti-doping agencies suffer from a sort of institu- 
tionalized blindness that has been characterized by Steve Rayner, who 
studies science and civilization at the University of Oxford, UK, as the 
“social construction of ignorance’. This is a strategy that organizations 
use necessarily to make their way in a complicated world. Organi- 
zations also create zones of ignorance to ‘manage uncomfortable 
knowledge’; and this can sometimes lead to dysfunction. 

In the case of doping in sport, uncomfortable knowledge includes 
the possibility that doping among athletes is much more prevalent than 
is recognized and that anti-doping programmes are not very effective. 
But without a proper effort to gather the data, we just don't know. = 


Roger Pielke Jr is director of the Center for Science and Technology 
Policy Research at the University of Colorado, Boulder, Colorado, USA. 
e-mail: rpielkejr@gmail.com 
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AGEING 
Less cancer 
protein, longer life 


Mice live longer — and seem 
to age more slowly — if they 
express lower levels of a cancer- 
promoting protein called MYC. 

High levels of MYC favour 
tumour growth, but some 
expression of the protein is 
required for survival. John 
Sedivy of Brown University 
in Providence, Rhode Island, 
and his colleagues studied the 
effects of low MYC expression 
in mice. Mice with only one 
copy of the Myc gene lived 
15% longer than those with two 
copies of the gene, although 
development and reproduction 
in the two groups were the 
same. 

Mice with a single copy of 
Myc hada faster metabolism, 
and less severe age-related 
conditions such as osteoporosis 
or the thickening of the heart 
tissue. 

Cell http://doi.org/znb (2015) 


Bird’s flight 
captured in a box 


Researchers have measured the 
aerodynamic forces ofa bird 
flying inside a box. 

Until now, the aerodynamic 
lift achieved by free-flying 
animals has only been 
estimated using models. David 
Lentink of Stanford University 
in California and his team built 
an enclosed device to directly 
measure forces generated bya 
bird’s wings during flight. With 
each flap, moving air exerts a 
force on the walls of the box, 
which is captured by sensors. 
The signals were synchronized 
with those from a high-speed 
camera, which records a bird’s 
flight from one side of the 
enclosure to the other. 

The researchers confirmed 
previous findings that each 


Selections from the 
scientific literature 


HYDROLOGY 


Dams reshape the world’s rivers 


Dams have altered 48% of all river flow 
worldwide. And if all dams planned for the 
next few decades are built, that proportion will 


nearly double. 


Giinther Grill of McGill University in 
Montreal, Canada, and his team developed two 
ways to analyse how dams break up and regulate 
river flow. They calculated how 6,374 existing 
dams and 3,377 proposed ones affected (or 


downstroke of a bird’s wings 
generates enough force to lift 
twice the animal’s body weight 
into the air. The device could 
be used with other animals 
and free-flying robots, says the 
team. 

J. R. Soc. Interface 12,20141283 
(2015) 


Sodium explosion 
caught on camera 


Chemists have scrutinized 

a classic piece of bench 

chemistry — the explosion 

that happens when sodium 

metal hits water — and revised 

the thinking of how it works. 
On contact with water, 
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would affect) river volume worldwide between 
1930 and 2030. The team found significant 
changes to existing water flow in rivers such as 


the Parana River in South America (pictured). 


the metal produces sodium 
hydroxide, hydrogen and heat, 
which was thought to ignite 
the hydrogen and cause the 
explosion. To delve into this, 
Pavel Jungwirth at the Czech 
Academy of Sciences in Prague 
and his team used high-speed 
cameras to capture the reaction 
ofa drop ofa liquid alloy of 
sodium and potassium with 
water at room temperature. 
They found that spikes of 
the metal shoot out from the 
droplet just 0.4 milliseconds 
after it enters the water — too 
fast to have been expelled by 
heat. Computer simulations 
revealed that sodium atoms at 
the surface of a small cluster 
each lose an electron within 
picoseconds. The positively 


The biggest future effects would arise from dams 
being planned for the Amazon basin. 

The models could help engineers to reduce 
the environmental effects of new dams. 
Environ. Res. Lett. 10,015001 (2015) 


charged ions rapidly repel each 
other, causing the explosion, 
while the protruding metal 
spikes generate new surface 
area that drives the reaction. 
Nature Chem. http://dx.doi. 
org/10.1038/nchem.2161 (2015) 


Big swings in 
weather to come 


Weather extremes could 
become more common as the 
climate warms this century, 
because extreme cooling 
events in the Pacific Ocean are 
predicted to occur more often. 
La Nifa events occur when 
the equatorial Pacific cools, 
causing droughts and floods 


JORGE SAENZ/AP 


W. RATCLIFF ET AL., NATURE COMMUN./CC BY 


T.L. KIVELL 


worldwide. Wenju Cai of the 
Commonwealth Scientific 
and Industrial Research 
Organization in Aspendale, 
Australia, and his colleagues 
analysed the occurrence of 
significant La Nias and related 
EI Nifio events from 1900 to 
2099, simulated under rising 
concentrations of greenhouse 
gases in the atmosphere. The 
researchers found that the 
number of extreme La Nina 
events increased from one 
every 23 years to one every 

13 years in the twenty-first 
century. 

Most of the severe La Nifas 
will follow severe El Nifos, 
resulting in wide, annual 
swings between opposite 
extreme weather events, the 
authors suggest. 

Nature Clim. Change http://doi. 
org/zph (2015) 


Ancient hands 
built for tools 


The hands of hominins that 
lived about 3 million years 
ago were capable of clutching 
tools. 

The first tool-using hominin 
is widely believed to have 
been Homo habilis — known 
as the handyman — in part 
because its appearance in the 
fossil record 2.4 million years 
ago coincides with the earliest 
stone tools. To search for 
earlier signs of tool use, a team 
led by Matthew Skinner and 
Tracey Kivell at the University 
of Kent, UK, analysed the 
composition of the hand 
bones of Australopithecus 
africanus fossils from South 
Africa, which are between 
2 million and 3 million 
years old. The ends of 
A. africanus metacarpal 
hand bones (pictured), 
which form the palm, 
resembled those of later 
toolmakers such as Homo 
sapiens and Neanderthals. 

The team concludes 
that A. africanus could 
forcefully grip objects 
using an opposable 
thumb. 
Science 347, 395-399 
(2015) 


ECOLOGY 


Pumas feel the 
fear near humans 


Female pumas that live near 
human populations hunt 
more often but spend less time 
eating their prey than do those 
in less populated areas. 

Humans can cause declines 
in wildlife populations, 
but their effect on animal 
behaviour is less well 
understood. Justine Smith 
and her colleagues at the 
University of California, 

Santa Cruz, tagged 30 pumas 
(Puma concolor) in California 
and tracked their movements 
in areas with four different 
densities of human housing. 
They found that at kill 

sites near the most densely 
populated areas, female pumas 
spent 42% less time consuming 
their prey than those in the 
least populated regions. To 
compensate, the females in the 
more developed habitats killed 
36% more deer. 

Fear of humans is probably 
driving this behavioural 
change, which could 
have further ecosystem 
effects, such as boosting 
scavenger populations and 
even compromising the 
reproductive health of female 
pumas, the authors speculate. 
Proc. R. Soc. B 282, 20142711 
(2015) 


ENVIRONMENTAL SCIENCE 


Methane escapes 
from major city 


The ageing pipeline 
infrastructure of Boston, 
Massachusetts, is leaking 
natural gas — mostly 
methane, a potent 
greenhouse gas — at more 
than double the rate of 
previous estimates. 
Atmospheric methane 
levels had plateaued 
but have been growing 
worldwide since 2007, 
for reasons that are 
unclear. Kathryn 
McKain at Harvard 
University in Cambridge, 
Massachusetts, and her 
colleagues monitored 


RESEARCH HIGHLIGHTS MiiiSaiaa¢ 


SOCIAL SELECTION 


Popular articles 
on social media 


Celebrating beauty in science writing 


Not many people read research articles for the snappy writing. 

But Stephen Heard, an ecologist at the University of New 

Brunswick in Fredericton, Canada, argues in a blogpost 

(go.nature.com/a2xh1m) that scientific writing could be 

more readable and even elegant, an observation that set 

offa widespread social-media reaction. Heard wrote that 

researchers should try livening up their scientific prose to 

attract and keep more readers. Isabelle Cété, a marine ecologist 

at Simon Fraser University in Burnaby, Canada, tweeted: “Let’s 
put some whimsy, humour and beauty in 


> NATURE.COM 
For more on 

popular papers: 
go.nature.com/uwgzik 


methane levels 

at four locations 

in Boston fora 

year. They also 

used a model 

of atmospheric 
processes to 
determine methane 
emissions. They 
found that 60-100% 
of the emitted 
methane was from 
the city’s natural-gas 
system, and that the Boston 
region is losing about 2.7% of 
its natural gas: 2-3 times more 
than other estimates. 

Cities that consume natural 
gas could be a bigger source 
of atmospheric methane than 
was previously thought. 

Proc. Natl Acad. Sci. USA 
http://doi.org/zpk (2015) 


How yeast go 
multicellular 


A genetic mutation in single- 
celled yeast turns it into a 
multicellular organism — 
hinting at how multicellularity 
might have evolved. 

William Ratcliff at the 
Georgia Institute of Technology 
in Atlanta and his co-workers 
studied a strain of yeast 
(Saccharomyces cerevisiae) 
in which the daughter cells 
remain attached to the mother 
cells after dividing, resulting in 
multicellular ‘snowflake’ yeast. 


scientific writing” Anthony Caravaggi, 
a conservation biologist at Queen's 
University Belfast, UK, tweeted: “Td love 
to see less turgidity & more charm” 


By mathematically modelling 
the way that clusters break 

off, the authors conclude that 
this way of growing makes the 
cells in each cluster genetically 
similar. This allows natural 
selection to act on the clusters 
rather than on individual cells, 
speeding up multicellular 
evolution. A mutation ina 
gene encoding the protein 
ACE2 causes the clusters to 
form. 

After 60 days of selection 
(400 generations), the yeast 
evolved bigger cells (pictured, 
right; scale bars are 50 um) 
compared with those at 
14 days (left). The results show 
how a single mutation can 
create multicellular clusters 
and set the stage for the future 
evolution of organismal 
complexity. 

Nature Commun. 6, 6102 (2015) 
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SEVEN DAYS 


Breeders convicted 


An Italian court found three 
employees of a lab-animal 
breeding company guilty of 
animal cruelty on 23 January, 
sentencing them to prison 
terms of between 12 and 

18 months. The court said that 
the three had mistreated beagle 
dogs and had put some animals 
down without good reason. 
Marshall BioResources, the 
US company that operates 

the Green Hill facility near 
Brescia, Italy, had previously 
been cleared of wrongdoing, 
but prosecutors pressed 
charges against Green Hill 
director Roberto Bravi and 
three other workers, one 

of whom was later found 
innocent. Even if the verdict 

is upheld by higher courts, 

the sentences could still 

be converted to fines or 
probation. See go.nature.com/ 
sbcdvz for more. 


Israel frees scientist 
A Palestinian physicist who 
had been detained by the 
Israeli military was released 

on 22 January. Imad Ahmad 
Barghouthi of Al-Quds 
University in Jerusalem was 
on his way to a scientific 
conference on 6 December 
when he was arrested while 
attempting to cross from the 
West Bank to Jordan. Israel’s 
reasons for administrative 
detention are normally kept 
secret, but Barghouthi claims 
that he had been jailed because 
his profile picture on Facebook 
showed him wearing a green 
scarf, the colour of the Hamas 
movement. See go.nature. 
com/qbgliy for more. 


POLICY 


US medicine effort 
US President Barack Obama 
announced a Precision 
Medicine Initiative during 
the annual State of the Union 


The news in brief 


New island born from volcanic eruption 


A volcanic eruption in the Pacific archipelago 
of Tonga has created a new island. Steam 

and ash began rising last month between the 
islands of Hunga Tonga and Hunga Ha‘apai, 
about 60 kilometres north of the kingdom's 
main island of Tongatapu. The newborn isle 


address on 20 January. 

“[ want the country that 
eliminated polio and mapped 
the human genome to lead 
anew era of medicine — 

one that delivers the right 
treatment at the right time,” 
he said. The effort will aim 

to match genomic and other 
data with patient health 
records to discover new 
treatments and tailor existing 
ones. Obama is expected to 
request hundreds of millions 
of dollars to fund research 
for the initiative at the US 
National Institutes of Health, 
which will have a key role. 
See page 540 for more. 


Vaccine price jump 
The medical charity Médicins 
Sans Frontieres in Geneva, 
Switzerland, called on drug 
companies to cut the price 

of vaccines in a 20 January 
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report. A full programme of 
childhood vaccines now costs 
68 times more than it did in 
2001, on the basis of the best 
prices available to low-income 
countries. Much of the price 
jump is due to expensive 
vaccines that protect against 
bacterial pneumococcal 
diseases, which have been 
rolled out in many low- and 
middle-income countries in 
recent years. 


Sea-science report 
The US National Science 
Foundation (NSF) should 
immediately slash spending 
on marine hardware, says 

a report released by the 
National Research Council 
on 23 January. The report, 
which is intended to guide 
US oceanography for the 
next decade, says that basic 
ocean research at the NSF 


breached the surface around 20 December. 

In mid-January volcanologists measured it at 
about 1.8 kilometres long and 100 metres high. 
A similar island formed during the volcanos 
last eruption in 2009, but was washed away by 
ocean waves within weeks. 


is losing out to the rising 

costs of infrastructure. It 

calls for a 20% cut to the 
operating budget of the Ocean 
Observatories Initiative, a 
marine monitoring effort due 
for completion in May. The 
initiative is currently expected 
to cost up to US$59 million 
per year. See page 538 for 
more. 


India energy deal 
US President Barack Obama 
and Indian Prime Minister 
Narendra Modi pledged 

on 25 January to expand 
cooperation on clean 
energy and tackling climate 
change. During Obama's 
visit to New Delhi, the two 
leaders agreed to work 
towards an international 
climate agreement as 

well as a separate plan 

to reduce emissions of 
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hydrofluorocarbons, a class 
of powerful greenhouse 
gases that are often used 

as refrigerants. India has 
committed to installing 

100 gigawatts of solar energy 
by 2022, but has made no 
concrete promises about 
curbing greenhouse-gas 
emissions. 


Marijuana research 


A leading medical 

academy has called for the 
US government to loosen 
restrictions on marijuana, 

so that researchers can 

study its potential medical 
benefits for sick children. Ina 
26 January policy statement, 
the American Academy of 
Pediatrics said that marijuana 
should no longer be classed 
alongside heroin and ecstasy 
as a schedule 1 drug with 

no accepted medical use. It 
argues that the drug should 
be relabelled as schedule 2, 
placing marijuana in the 
same category as potentially 
addictive painkillers such 

as morphine and codeine. 
Several federal agencies are 
already considering sucha 
change. 


} RESEARCH 
Tiger boom 


Tigers are mounting a 
comeback in India. Last week, 
the country’s government 
announced a 30% increase 

in the population of Bengal 


TREND WATCH — 


Venture capitalists invested more 
in the life sciences last year than 
they have since 2008, according 


to data released on 16 January 
by Price WaterhouseCoopers 
and the US National Venture 


Capital Association. Investment 
in biotechnology and medical- 
devices companies rose by 29% 
on 2013, driven by deals in fields 
such as cancer immunotherapy. 
Public investors also embraced 
health care, with 102 companies 
launching an initial public 
offering in 2014, compared with 
54 in 2013. 


tigers (Panthera tigris 

tigris; pictured), from 

1,706 individuals counted 

in a2010 census using 
thousands of camera traps, 
to 2,226 recorded last year. 

A 2004 survey documented 
just 1,411 tigers. The Indian 
government attributed the 
increase to efforts to limit 
poaching and minimize 
human encroachment on the 
feline’s habitat. An estimated 
70% of the world’s tigers now 
reside in India. 


Fish from the cold 


A marine ecosystem has 

been discovered beneath the 
Ross Ice Shelf, which extends 
hundreds of kilometres off the 
coastline of Antarctica. After 
drilling through 740 metres 
of ice, researchers sent a 
remotely operated vehicle to 
explore the area around the 
borehole on 16 January. The 
vehicle photographed fish and 
marine crustaceans known 

as amphipods thriving in the 
pitch dark, -2 °C waters — the 
nearest to the South Pole that 


BOOM TIME FOR BIOTECH 


such an ecosystem has been 
found. See go.nature.com/ 
jomocy for more. 


EVENTS 


No climate hoax 


The US Senate resoundingly 
affirmed the existence of 
global warming during 

a 21 January debate over 
legislation that would 
authorize construction of 

the Keystone XL oil pipeline. 
Lawmakers voted 98-1 to 
adopt a 16-word, non-binding 
amendment: “It is the sense 
of the Senate that climate 
change is real and not a hoax.” 
Another amendment, to pin 
some of the blame for climate 
change on human activities, 
failed by 50-49. See go.nature. 
com/tqjjdn for more. 


Doomsday clock 
The Bulletin of the Atomic 
Scientists pushed the hands 
of its Doomsday Clock two 
minutes closer to midnight 
on 22 January — symbolizing 
that the planet is closer to 


Venture capitalists, who have showered money on software start-ups 
in recent years, gave life sciences a shot in the arm in 2014. 
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global disaster. The physicist- 
founded magazine, based 

in Chicago, Illinois, cited 
unchecked climate change and 
the threat of outsized nuclear- 
weapons arsenals as the 
reasons for the shift. The clock 
now stands at three minutes 

to midnight: the closest it has 
been since 1984, but further 
away than in 1953 after the 
hydrogen bomb was first 
tested, when the clock stood at 
two minutes to midnight. 


Open journals 

The University of California 
Press plans to establish two 
open-access journals, it 
announced on 20 January. 

A ‘mega journal’ called 
Collabra will publish research 
articles in the biomedical, 

life, environmental and social 
sciences — and it will be 
unusual in paying reviewers 
and editors for their time. 
They will have the option of 
donating the money to fund 
other papers published by 
Collabra or giving it to their 
own institution to pay for open 
access. The second journal, 
Luminos, will publish research 
monographs. 


Space investments 
SpaceX added US$1 billion to 
its coffers with financing from 
Google and the investment 
firm Fidelity, it announced 

on 20 January. The company, 
of Hawthorne, California, 

is one of several aerospace 
firms currently carrying cargo 
to the International Space 
Station for NASA, and it is 
working towards flying US 
astronauts there in the future. 
Company founder Elon Musk 
wants to send astronauts 

to Mars one day. For now, 
however, the investments will 
support SpaceX’s plans to 
develop reusable rockets and 
manufacture fleets of small 
satellites, which are likely to 
provide Internet access. 


> NATURE.COM 
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> regulations and the growth of renewable 
energy sources, especially wind power. 

But first, the regulations must withstand 
inevitable industry lawsuits intended to 
weaken or overturn them. “The power-plant 
rule is really pivotal to Obamas legacy, and it is 
going to face tough legal scrutiny,’ says David 
Victor, director of the Laboratory on Interna- 
tional Law and Regulation at the University of 
California, San Diego. If it succeeds, he adds, 
“it could reverse the position of the United 
States internationally”. 

Although Obama was unable to secure 
climate legislation during his first term, his 
administration did achieve gains in the wake 
of the recession. It secured billions of dollars 
in stimulus funding for clean energy, effi- 
ciency measures and green infrastructure, as 
well as establishing significant new standards 
for vehicle emissions and fuel economy. That, 
combined with the economic slowdown and 
the shift away from coal in the electricity sector, 
means that US greenhouse-gas emissions have 
already decreased by around 10% since 2005. 

In theory, the administration still has both 
the time and the means to reduce emissions 


enough for the United States to meet its 
international commitments, says Kennedy. 
“They will have to take very serious action, 
but the tools that they have available to them 
should allow them to do it” 

Republicans have vowed to challenge Obama 
at every turn. They started the current session 
with a debate on legislation to approve the con- 
troversial Keystone 


“The power- XL pipeline, which 
plant rule 1s would carry crude 
really pivotalto oil from the tar sands 
Obama’s legacy, of Alberta, Canada, 
anditis going to to refineries on the 
facetoughlegal US Gulf Coast. The 
scrutiny.” House of Representa- 


tives quickly passed 
a bill to approve the pipeline, but partisan dis- 
agreements have delayed a Senate vote. Obama 
has promised to veto the legislation. 

Although the pipeline would have a small 
effect on global greenhouse-gas emissions, it 
has become a symbolic issue for both sides 
of the climate debate. On 21 January, Senate 
Democrats used the Keystone fight to confront 
Republicans on their views about climate by 


putting to a vote declarations about human 
involvement in global warming (see Nature 
http://doi.org/zpx; 2015). Fifteen Republi- 
cans supported an amendment to the Keystone 
bill stating that climate change is affected by 
human activity, and five voted for an amend- 
ment stating that climate change is “signifi- 
cantly” affected by humans. 

Although neither amendment passed, those 
votes are a sign that Republicans are feeling 
pressure and may warm to certain climate 
solutions in future, says Bob Inglis, a Republi- 
can former member of the House who heads 
the Energy and Enterprise Initiative, a think 
tank that advocates for conservative environ- 
mental solutions at George Mason University 
in Fairfax, Virginia. Although Inglis under- 
stands why Obama has chosen to sidestep 
Congress and address climate change with 
regulations, he says that the president still has 
a potential opportunity to secure his environ- 
mental legacy by striking a grand legislative 
bargain with his Republican opposition. 

“Obama is in a box,” says Inglis, “but he 
could get out of that box if he were a little bit 
bolder.” m SEE EDITORIAL P. 527 


SOLAR SYSTEM 


Philae hunt hangs in the balance 


Rosetta mission would have to sacrifice other science to search for comet lander. 


BY ELIZABETH GIBNEY 


he lost space probe Philae, which made 
history after it landed on a comet last 

November, is posing a dilemma for 
scientists at the European Space Agency (ESA). 
They have what is probably their last chance 
to change the path of Philae’s parent craft, 
Rosetta, to hunt for the lander, which went 
missing shortly after it touched down on 
comet 67P/Churyumov-Gerasimenko. 
But the shift would also mean sacrificing 
some of Rosetta’s long-planned science 
observations. 

The agonizing choice comes as the mis- 
sion team published its first batch of papers 
from observations made after Rosetta entered 
into orbit around 67P last August — report- 
ing a varied landscape and hinting at the 
comet's origins. 

Philae has been silent since its batteries 
ran out just days after its bumpy landing on 
12 November. On the basis of images of its 
initial bounces and data from radio instru- 
ments, Philae’s position has been narrowed 
down to a 20-metre by 200-metre strip. But 
efforts to find the 1-metre-wide lander in high- 
resolution pictures taken by Rosetta from a dis- 
tance of about 20 kilometres have so far failed. 


NEARLY WEIGHTLESS 


The resulting effect of gravitational potential and 
centrifugal forces, mapped on coment 
67P/Churyumov-Gerasimenko, is revealed to be 
greatest on the lobes and weaker in the neck region. 


Newton metres/kilogram 
-0.27 -0.32 -0.36 -0.41 -0.45 


|_|, 


Project scientists are debating whether to 
send Rosetta, which is still orbiting the comet, 
down to an altitude of 6 kilometres, over the 
patch where Philae is thought to be. It would 
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be the closest that the craft has ever been to 
67P. But Rosetta has limited fuel. Any attempt 
to look for Philae would mean scrapping a 
different flyby, which would offer the chance 
to image the comet in a shadow-free shot that 
should reveal fine details about the surface 
structure and composition. 

As the comet approaches the Sun, growing 
cometary surface activity in the form of jets of 
gas and dust also makes it increasingly risky 
for Rosetta to approach. If Rosetta is to 

stick to the original flyby plan, sched- 

uled for 14 February, the craft will not 
come as close to 67P until 2016, says 
the mission’s flight director Andrea 

Accomazzo, after the comet has swung 

around the Sun and headed back out 

to space. 


SURFACE SEARCH 

There are scientific benefits to pinpointing 
Philae’s location, says Wlodek Kofman, prin- 
cipal investigator on Rosettass CONSERT 
(Comet Nucleus Sound- 


ing Experiment by Radi- NATURE.COM 
owave Transmission) — Forthebestimages 
experiment, which is _ fromRosetta’s data 
designed to send radio _ haul, see: 


waves between the  go.nature.com/rrihsj 


SOURCE: H. SIERKS ET AL. SCIENCE HTTP://DOIORG/ZP2 (2015) 


parent craft and Philae to study the comet's 
interior. Not knowing the lander’s exact 
location makes it much harder to process the 
data that scientists have already received and to 
generate accurate results, he says. Spotting the 
lander would also help to determine its exact 
location and angle, and to predict how likely 
it is to come back to life in the coming months 
as the comet nears the Sun and its solar panels 
begin to receive more light, Kofman says. 

The decision is not an easy one, says Holger 
Sierks, who is principal investigator on 
Rosetta’s OSIRIS (Optical, Spectroscopic, and 
Infrared Remote Imaging System) instrument. 

The mission has already produced a haul 
of results, which were published in a series of 
papers in Science on 22 January (see Nature 
http://doi.org/zpz; 2015). Using data from OSI- 
RIS and the Radio Science Investigation instru- 
ment, Sierks and his collaborators calculated the 
gravity on the rubber-duck-shaped comet and 
created a map (see ‘Nearly weightless’) that also 
takes into account the centrifugal force caused 
by the comet's rotation (H. Sierks et al. Science 
http://doi.org/zp2; 2015). The resulting force is 
greatest on top of the lobes, but it is about six 
times weaker in the neck region, where dust can 
lift off more easily. The team also used the data 
to calculate the comet’s density, finding that the 
body is relatively fluffy and porous — with a 
density of around half that of water, giving clues 
to its structure and strength. 

The researchers described three-metre- 
wide pebble-like features that are found 
all over the comet, which they nicknamed 
“goosebumps”. Sierks says that the shapes 
could hint at the size of the grains of dust and 
ice that first clumped together in the early 
Solar System before forming larger bodies. 
“The hypothesis is these might be the build- 
ing blocks of comets,” he says. 

In another of the papers, OSIRIS data 
enabled Sierks and his collaborators to classify 
the geography of the comet’s surface on the 
basis of terrain types. These include fractures, 
possible impact craters and an array of dunes 
and ripples that may have been formed by gas 
travelling around the surface, like wind shaping 
sand in a desert (N. Thomas et al. Science http:// 
doi.org/zp3; 2015). 

The final word on whether to send Rosetta 
to look for Philae rests with ESA. Kofman says 
that an informal vote among Rosetta scientists 
came down narrowly on the side of doing it. As 
Nature went to press, the agency was thought 
to be leaning towards sticking to its original 
agenda, because looking for Philae would 
mean too much upheaval for the mission. 

If ESA decides against a mission shift to hunt 
for Philae, the team could still get lucky: it may 
find clues as to the lander’s whereabouts either 
in existing images or in new shots taken from 
flybys between 20 km and 50 km away in the 
coming months. Finding Philae in these kind 
of flybys is not impossible, says Accomazzo, 
“but it would be sheer luck” = 


PSYCHOLOGY 
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Clash over ‘smart 
unconscious’ 


Report examining decisions made while distracted adds to 
controversy about the power of the unconscious. 


BY ALISON ABBOTT 


will you do a better job if you absorb 

yourself in, say, a crossword puzzle 
instead of ruminating about your options? 
The idea that unconscious thought is some- 
times more powerful than conscious thought 
is attractive, and echoes ideas popularized 
by books such as writer Malcolm Gladwell’s 
best-selling Blink. 

But within the scientific community, 
‘unconscious-thought advantage’ (UTA) has 
been controversial. Now Dutch psychologists 
have carried out the most rigorous study yet 
of UTA — and find no evidence for it. 

Their conclusion, published this week in 
Judgement and Decision Making, is based on 
a large experiment that they designed to pro- 
vide the best chance of capturing the effect 
should it exist, along with a sophisticated sta- 
tistical analysis of previously published data’. 

The report adds to broader concerns 
about the quality of psychology studies and 
to an ongoing controversy about the extent 
to which unconscious thought in general 
can influence behaviour. “The bigger debate 
is about how clever our unconscious is,” 
says cognitive psychologist David Shanks of 
University College London. “This carefully 
constructed paper makes a great contribu- 
tion.” Shanks published a review last year 
that questioned research claiming that vari- 
ous unconscious influences, including UTA, 
affect decision making’. 

A typical study probing UTA asks subjects 
to make a complex decision, such as choosing 
a car or acomputer, after either mulling over 
alist of the object's attributes or viewing the 
list quickly and then engaging in a distract- 
ing activity such as a word puzzle. However, 
such studies have drawn different conclu- 
sions, with about half of those published so 
far reporting a UTA effect and the other half 
finding none. 

Proponents of the theory claim that the 
effect is exquisitely sensitive to experimental 
variations, and often attribute the negative 
results to the fact that many research groups 
varied elements of the set-up, such as the 
choice of puzzle used for the distraction’. 


|: you have to make a complex decision, 


Critics say that the positive results came 
from having too few participants in the 
experiments. 

Psychologists Mark Nieuwenstein and 
Hedderik van Rijn at the University of 
Groningen in the Netherlands set out with 
their colleagues to determine which explana- 
tion was correct. 

They asked 399 participants — around ten 
times more than the typical (median) sample 
sizes in other studies — to choose between 
either 4 cars or 4 apartments on the basis of 
12 desirable or undesirable features. They 
incorporated the full list of conditions that 
UTA proponents had reported as yielding 
the strongest effect, such as the exact type of 
puzzle used as a distraction. They found that 


the distracted group 
“How wemake WS 1° more likely 
Bicisie. soi than the deliberating 
how we migh t group to choose the 

most desirable item. 
make them The scientists then 
better, has reanalysed 60 of 
ec aneaiorag the 81 experiments 


described in the 32 
UTA papers pub- 
lished before April 
2014. For this ‘meta-analysis, they excluded 
experiments that had insufficient data for 
analysis or that deviated from conditions 
that are reported as likely to elicit UTA (only 
one of these experiments had claimed a UTA 
effect). They also included the results of their 
own study. When they applied a rigorous sta- 
tistical meta-analysis, they found no signifi- 
cant UTA effect. 

“Psychologists have historically prided 
themselves on their command of statistics,” 
says psychologist Jonathan Baron at the Uni- 
versity of Pennsylvania in Philadelphia, the 
editor of Judgement and Decision Making. 
But this study shows that many in the past 
were poorly designed. He adds: “If UTA is 
out there, it can’t be captured in experiments 
designed in the lab” 

Psychologist Ap Dijksterhuis at Radboud 
University in Nijmegen, the Netherlands, 
who first described’ unconscious-thought 
theory, which predicts UTA, in 2004, says: 
“Tt is certainly true that psychology has > 


importance.” 
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improved quite a bit in recent years when it 
comes to analysing data. And yes, in the past, 
suboptimal analyses have been applied” But 
he does not accept the findings of the meta- 
analysis. He says that it would have produced 
different conclusions if the researchers had 
included all previous UTA experiments, rather 
than excluding some and relying on a subset. 
He adds that “the evidence for UTA is growing 
quickly” and is widely accepted. 

UTA is not the only ‘smart-unconscious’ 
claim to come under scrutiny. For example, 
experiments carried out under the “Many 
Labs” Replication Project, which coordi- 
nates labs internationally to repeat psy- 
chological studies in order to validate their 
claims, as well as several separate studies, 


have challenged another psychological con- 
cept, social priming. Under social priming, 
certain behaviours are claimed to be modi- 
fied unconsciously by previous exposure to 
stimuli, such as an American flag, or thinking 
about money’. 

Other doubts raised about unconscious 
thought include its role in some types of deci- 
sion making under uncertainty. 

In spite of the most recent findings, Brian 
Nosek, a psychologist at the University of 
Virginia in Charlottesville who co-launched 
Many Labs, says that he remains optimi- 
stic about the theory underlying UTA. “I 
would be surprised if unconscious-thought 
theory did not hold up, because it fits with 
contemporary theories,’ he says. 


Shanks agrees that the debate over uncon- 
scious-thought theory is probably not over. 
“How we make decisions, and how we might 
make them better, has practical and intel- 
lectual importance,” he says. “If there is any 
evidence that distraction or unconscious 
rumination helped, we'd want to know 
about it — but the conclusions are so far very 
premature.” = 
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OCEANOGRAPHY 


US ocean sciences told to 
plot fresh course 


Major report calls for cuts in infrastructure funding to increase spending on science. 


BY ALEXANDRA WITZE 


sea, the ocean-sciences division of the 

US National Science Foundation (NSF) 
should slash what it spends on marine hard- 
ware to fund more research, says a major 
report by the US National Research Council. 
It proposes making the biggest cut to the show- 
case US$386-million Ocean Observatories Ini- 
tiative (OOI), which after years of construction 
is just months away from being finished. 

The report's authors suggest that the NSF 
should cut 20% of the OOI’s operations budget, 
and reduce its contributions to the international 
scientific ocean-drilling programme and the 
US academic research fleet. If the agency takes 
that advice, it could free up enough money for 
US oceanographers to begin to reclaim much 
of their lost science, as well as expand partner- 
ships with international researchers. 

“Tt’'s an exciting time to be in ocean science,” 
says Shirley Pomponi, an oceanographer at 
Florida Atlantic University in Fort Pierce and 
co-chair of the report committee. “But we need 
to take steps to make that better” 

US oceanography has been in trouble for 
a while. The US Navy paid for the bulk of 
the country’s academic oceanographic work 
until the 1960s, after which the NSF began 
shouldering more of the burden. But even as 
filmmaker James Cameron, flush with private 
money, explored the Pacific Ocean’s Mariana 


aoe with the rising costs of going to 


SINKING SCIENCE 


As the US National Science Foundation (NSF) has increased its spending on ocean 
hardware, such as ships and instruments, its funding for ocean science has fallen. 
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Trench with a handful of scientists in 2012, 
most research oceanographers found them- 
selves with fewer ways to get to sea. 


EIGHT PRIORITIES 

Over the past decade, the NSF’s ocean- 
infrastructure expenses have risen by 18% — 
even as the ocean-science division's inflation- 
adjusted budget dropped by more than 10%, 
to just under $350 million annually. In 2013, 
the division started to spend more on infra- 
structure than it did on science (see ‘Sinking 
science’). That is when the NSF asked for out- 
side advice on how to cope. 
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Infrastructure outlay 


overtakes science. 


Projected spending 


2010 2012 2014 2016 2018 


The report, which was published on 
23 January, lays out eight science priorities for 
the next decade, including studies of sea-level 
change, marine biodiversity, earthquakes and 
tsunamis, and life beneath the sea floor. Unu- 
sually, it also suggests how to pay for the studies 
— an immediate 10% cut in infrastructure, 
spread unequally among three programmes, 
followed by a similar or larger cut over the next 
five to ten years. “This document gives them 
the flexibility to make some really hard deci- 
sions,’ says Samantha Joye, an oceanographer 
at the University of Georgia in Athens. 

The smallest suggested cut, just 5%, applies 


SOURCE: NRC 
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US ocean sciences told to steer a new course 


Major report calls for cuts to infrastructure, including fledgling Ocean Observatories Initiative, 


to increase spending on science. 
Alexandra Witze 


23 January 2015 


Lamont-Doherty Earth Observatory 


The US National Science Foundation should cut spending on research ships by 5%, a new report says. 


Faced with rising costs of going to sea, the ocean-sciences division of the US National Science 
Foundation (NSF) should immediately slash what it soends on marine hardware, says a new report. It 
suggests making the biggest cut to the flagship US$386-million Ocean Observatories Initiative (OOl), 
which after years of construction is just months away from being finished. 


The report, released on 23 January by the US National Research Council, is : 

likely to guide US oceanography for years to come. It is the first formal Top picks 

attempt to address what many researchers have grumbled about for years from nature news 

— that basic ocean science at the NSF is losing out to the rising costs of 

infrastructure. e Science pours in from 
Rosetta comet 
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Back to basics mission 
To get science funding back to its historical level, the report’s authors + GM microbes created 
that can’t escape the 


suggest slashing 20% of the OOl’s operations budget, and making smaller 
cuts to the NSF contributions to the scientific ocean-drilling programme and 


lab 
e Crunch time for pet 


the US academic research fleet. If the agency takes that advice, it could free theory on dark matter 


up enough money for US oceanographers to begin to reclaim much of their 
lost science, as well as expand partnerships with international researchers. 


“It’s an exciting time to be in ocean science,” says Shirley Pomponi, an oceanographer at Florida 
Atlantic University in Fort Pierce and co-chair of the report committee. “But we need to take steps to 
make that better.” 


US oceanography has been in trouble for a while. The US Navy paid for the bulk of the country’s 
academic oceanographic work until the 1960s, after which the NSF began shouldering more of the 
burden. Even as filmmaker James Cameron, flush with private money, explored the Pacific Ocean's 
Mariana Trench with a handful of scientists in 2012, most research oceanographers found themselves 
with fewer ways to get to sea. 


Over the past decade, the NSF’s ocean infrastructure expenses rose 18% as the ocean-science 
division’s inflation-adjusted budget dropped by more than 10%, to just under $350 million annually. In 
2013, the division started spending more on infrastructure than it did on science. That is when the NSF 
asked for outside advice on how to cope. 


Eight priorities 

The report lays out eight science priorities for the next decade, including sea-level change, marine 
biodiversity, earthquakes and tsunamis, and life beneath the sea floor. Unusually, it also suggests how 
to pay for the studies — a 10% cut in infrastructure immediately, soread unequally among three 
programmes, followed by a similar or larger cut over the next five to 10 years. “This document gives 
them the flexibility to make some really hard decisions,” says Samantha Joye, an oceanographer at the 
University of Georgia in Athens. 


The smallest suggested cut, just 5%, applies to the NSF contribution to 
the 20-vessel research fleet, because ships enable more of the future 
science priorities. Even so, the report recommends that the agency 


Related stories 


e US Arctic research ship 


build no more than two new ‘regional class’ vessels in the coming ready fo cast off 


years; it had been considering building three. ¢ Marine science: 
Oceanography's billion- 

The middle cut, of 10%, is recommended for the scientific ocean- dollar baby 

drilling programme — in the guise of the JO/DES Resolution drillship, ¢ Drilling hit by budget 

which has already endured a number of cutbacks. woes 


Finally, Pomponi and her colleagues suggest that 20% should be cut More related stories 
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from the OOl's operations budget, which will run between $55 million and $59 million when it begins full 
operations this year. They note that the OOI is made of many components, some of which do a better 
job than others at studying the key science priorities. 


For instance, a cabled sea-floor observatory off the coast of Oregon addresses the risks of underwater 
earthquakes and tsunamis. Two moveable arrays of instrumented moorings — one off the US east 
coast and one off the west coast — tackle questions such as regional sea-level change. But four deep- 
water sites — two at high northern latitudes and two in the far south — are much less crucial for the 
eight priorities, says the report. It argues that at least one of the southern sites could be sacrificed 
without much scientific loss, while others may need only to be instrumented for two or three years to 
understand ocean change rather than the 25-year planned lifetime of the OOI. 


Seeking balance 

“We didn’t say the OOI was not important,” says report co-author Melbourne Briscoe, an oceanographer 
who has previously worked for the government and now runs an environmental consulting company in 
Alexandria, Virginia. “It’s just that it wasn’t as relevant to that set of eight questions as other pieces of 
infrastructure were.” 


Officials of the Consortium for Ocean Leadership, the Washington, DC-based organization that is 
building the OOI for the NSF, declined to comment, saying that they needed more time to digest the 
report and its implications. 


Richard Murray, who became director of NSF's ocean sciences division this month, says that his top 
priority is to figure out how to move forward with the new community advice. "We're looking at this report 
very, very seriously," he says. "We hear very clearly the committee's recommendation that something 
needs to be done." 


Mitchell Lyle, an oceanographer with Oregon State University in Corvallis who was not involved in the 
report, says that the field needs to stop quibbling over how to spend ever-smaller slices of the budgetary 
pie, and instead take a fresh look at what US oceanography can really afford. "Funding levels are now 
dropping below a sustainable level to support the number of ocean scientists that we now have," he 
says. Oceanographers might be better served by looking at more dramatic options for saving federal 
research dollars, he says, such as possibly closing some of the current oceanographic institutions. 
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Mitch Lyle » 2015-01-25 11:58 PM 

As the person quoted in the last paragraph, | would like to clarify my conversation—the point | 
was trying to make is that oceanographic institutions are already being damaged by the current 
and projected funding climate. The most likely path that the report describes is continued flat 
funding for ocean sciences. While oceanography can survive on flat funding for a while and the 
report describes ways it can, ocean science has been weakening over the last decade and will 
continue to weaken as the field expands but funding does not. Despite important new science to 
pursue, new technology will not reduce the expense of data collection to the point that we can 
afford new people for doing the science. OOI, IODP, and previous long term plans for OCE 
arose from the assumption at the turn of the century that there was strong bipartisan support for 
doubling the NSF budget. The NSF budget did not double and the budget in ocean sciences 
shrank by 20%. At the present time we are also losing skills and expertise through retirement 
and loss of positions, and are not hiring new scientists and technicians rapidly enough to 
maintain a healthy science. This trend will continue unless additional funding appears. NSF 
cannot close oceanographic institutions, and | didn’t suggest that it should. | did point out that 
one of the current weaknesses in the field is that the numbers of institutions that exist are too 
great for the dollars available, and all institutions will tend to grow weaker and expertise will tend 
to become more diffuse under flat funding. There is a strong tendency now for technical groups 
to lose critical mass of people and lose functionality. If we seriously believe that there will be no 
new funding, we need to rethink how we do oceanography. How do we get healthy again? 
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Some children with cystic fibrosis are eligible for targeted treatments. 


HEALTH CARE 


Precision-medicine 
plan raises hopes 


US initiative highlights growing focus on targeted therapies. 


BY SARA REARDON 


ith the pipeline of conventional 
drugs drying up, researchers are 
increasingly attempting to cus- 


tomize treatments on the basis of a person’s 
genetics or environment. Now the US govern- 
ment wants to get in on the act. 

During his State of the Union address to 
Congress on 20 January, President Barack 
Obama announced a programme called the 
Precision Medicine Initiative. “I want the 
country that eliminated polio and mapped 
the human genome to lead a new era of medi- 
cine — one that delivers the right treatment at 
the right time,” he said. 

The White House is remaining tight-lipped 
about the details of the programme, declin- 
ing to answer questions from Nature — as is 
the US National Institutes of Health (NIH), a 
key partner in the effort. But Kay Holcombe, 
senior vice-president for science policy at the 
Biotechnology Industry Organization (BIO) 
in Washington DC, says that her conversa- 
tions with the NIH suggest that the initiative 
will seek to match genome information with 
many other data types, such as health records 
and blood-test results. 

The agency seems to have been planning 


the effort for some time, listing precision 
medicine as one of its four priorities in 
its 2015 budget proposal; another was big 
data. Other government agencies are also 
expected to participate, as may some pri- 
vate companies. Further details, including 

the cost, are likely to 


“My personal trickle out as Obama 
attitude is prepares his budget 
request for fiscal 
pai haeea year 2016, which is 
rather than ' due to be released 
duplicate and die : February. : 
compete in an major question 


is whether the plan 
will run alongside or 
merge with a simi- 
lar proposal being discussed by members 
of the US House of Representatives’ Energy 
& Commerce Committee. The committee’s 
21st Century Cures plan seeks to speed up 
the translation of research advances into 
treatments, and personalized medicine is one 
potential element of the effort. Law-makers 
are expected to release a first draft of that pro- 
posal shortly. 

Both the White House effort and the House 
plan would be extremely expensive, but they 
might not be as difficult to carry out as they 


inefficient way.” 
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first seem. Rather than recruiting all of their 2 
participants anew, Holcombe says that both 
initiatives could collect data and recruit par- 
ticipants from ongoing longitudinal studies. 
These include the Million Veteran Program at 
the US Department of Veterans Affairs, which 
seeks to understand how genes affect health, 
and the NIH’s 67-year-old Framingham Heart * 
Study at Boston University in Massachusetts, 
which aims to identify risk factors for heart 
disease. 


MELIE-BENOIST/BSIP/CORB 


SCANT DETAILS 

If the federal programme takes the form of 
a public-private partnership, then private 
insurance companies and health systems 
could contribute data as well. David Ledbetter, 
chief scientific officer at Geisinger Health 
System in Danville, Pennsylvania, says that 
his company might be willing to join such 
an effort. Geisinger, a network of hospitals 
and clinics, aims to recruit up to 200,000 of 
its 3 million customers to have their exomes 
— parts of the genome that code for proteins 
— sequenced and integrated with their health 
records. The company now has completed 
sequences from about 20,000 people, and it 
is preparing to provide each person with an 
analysis of his or her health risks. 

“My personal attitude is always to try to 
collaborate, rather than duplicate and com- 
pete in an inefficient way,’ Ledbetter says. 

Still, standardizing data collection and 
patient recruitment across the country will be 
extremely difficult, especially if ongoing stud- 
ies are rolled into the effort. Such complexi- 
ties sank the NIH’s 100,000-person National 
Children’s Study, which sought to track envi- 
ronmental influences on children’s health; the 
agency cancelled the project last month after 
14 years of delays. 

Informed consent and data security will 
present additional challenges. The roll-out 
of the UK National Health Service's care.data 
project, which would make health informa- 
tion from most patients in England available 
for research, has been delayed for several 
months for this reason. 

Nevertheless, with personalized medicine 
in vogue, studies are likely to continue to 
grow in both number and magnitude, and 
in both the public and private sectors. The 
Precision Medicine Initiative could once 
again pit NIH director Francis Collins, who 
headed the Human Genome Project, against 
his old private-sector rival, Craig Venter. Last 
March, Venter launched a company called 
Human Longevity in San Diego, Califor- 
nia, with the goal of sequencing one million 
human genomes by 2020. The effort is gain- 
ing steam: on 14 January, Venter announced 
that his company would be sequencing tens 
of thousands of genomes for Genentech, a 
biotechnology company based in South San 
Francisco, California, that is searching for 
new drug targets. m 
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Manot Cave, near Israel’s Sea of Galilee, was rediscovered when a bulldozer opened an entrance to it in 2008. 


Neanderthals gain human neighbour 


Cranium discovery shows that Homo sapiens was living in Middle East 55,000 years ago. 


BY EWEN CALLAWAY 


55,000-year-old incomplete skull 
Az in Israel may belong toa human 

group that interbred with Neander- 
thals. Discovered deep in a cave by amateur 
speleologists, the partial cranium also fills a 
major gap in the fossil record of Homo sapiens’ 
journey from Africa to Europe. 

“Here we actually hold a skull of a human 
being that was living next to the Neanderthals,’ 
says Israel Hershkovitz, the leader of a study 
published today in Nature (I. Hershkovitz et al. 
Nature http://dx.doi.org/10.1038/nature14134; 
2015). “Potentially he is the one that could 
interbreed with the Neanderthals,” says Her- 
shkovitz, who is a physical anthropologist at 
Tel Aviv University in Israel. 

Genome studies of Neanderthals (Homo 
neanderthalensis) and of both ancient and 
contemporary H. sapiens suggest that the two 
species interbred somewhere in the Middle 
East between 50,000 and 60,000 years ago 
(Q. Fu et al. Nature 514, 445-449; 2014). But 
the problem with this idea is that no remains of 
anatomically modern humans have been dis- 
covered in the Middle East from this crucial 
period, after H. sapiens left Africa and before 
it colonized Europe and Asia. 

In 2008, a bulldozer clearing land for a 
development near the Sea of Galilee in northern 
Israel revealed an opening to a limestone cave 
that had been sealed for more than 15,000 years. 
Amateur speleologists were the first to explore 
the cave, and they spotted the battered 
bone — the top portion ofa skull — resting on 
a ledge. The Israel Antiquities Authority soon 


The ‘skullcap’ fills a gap 
in the path that modern 
humans took as they 
spread out of Africa. 


launched a complete survey of Manot Cave, 
finding buried stone tools at several spots that 
are still being excavated. 

The skull was unquestionably from H. sapi- 
ens, says Hershkovitz: it was similar in shape 
to those of earlier African and later European 
humans. A patina of calcite coated the fragment, 
and the researchers used radioactive uranium 
in the mineral to date the bone to about 55,000 
years old. That means that “the Manot people 
are probably the forefathers of the early Palaeo- 
lithic populations of Europe’, Hershkovitz says. 

The Manot people are also a leading 
candidate for the humans that bred with 
Neanderthals — exploits that have given all 
of today’s non-African humans a sliver of 
Neanderthal heritage. The Manot Cave is not 
far from two other sites that held Neanderthal 
remains of a similar age. “The southern Levant 
is the only place where anatomically modern 
humans and Neanderthals were living side by 
side for thousands and thousands of years,” 
Hershkovitz says. The ultimate proof would be 
to look for the presence of Neanderthal ances- 
try in DNA from the skull, but the region’s 


balmy temperatures mean that ancient DNA 
is unlikely to have been preserved. 

Jean-Jacques Hublin, a palaeoanthropologist 
at the Max Planck Institute for Evolutionary 
Anthropology in Leipzig, Germany, agrees 
that the chances of recovering DNA from 
the skull fragment are slim. But he hopes that 
further excavations will find human remains 
that have stayed cool enough to still contain 
DNA. These digs might also connect the skull 
to stone tools and other relics of daily life, 
which could strengthen the Manot skulls link 
to early Europeans. The artefacts uncovered 
so far are thought to be much younger than 
the skull. “We have a skull, and we have a site 
where there is some archaeology, but there is 
no link between the skull and the archaeology. 
It’s a bit annoying,” Hublin says. 

“This specimen is really important and 
exciting, as — assuming the dating is cor- 
rect — it shows for the first time that modern 
humans existed in the Near East at the same 
time as Neanderthals,” says Katerina Harvati, a 
palaeoanthropologist at the University of Tubin- 
gen in Germany. “Until now we had no evidence 
that the two even coexisted in this region during 
this time period. So this is a crucial piece of the 
puzzle? = 


The News Feature ‘Laser focus’ (Nature 517, 
430-432; 2015) gave the wrong amount for 
the funding behind the Center for Adaptive 
Optics. The grant was in fact for around 
US$40 million over 10 years. 
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NOT YOUR 
AVERAGE 
TECHNICIAN 


Research relies on unsung 
heroes working behind the 
scenes — and some of them 

have rather unusual jobs. 


The glass-blower 
in the bush 


BY MICHAEL HOPKIN 


unlikely place to go if you need to get your hands on some highly 
technical glassware in a hurry. Turn off the main street with its tav- 
ern, general store and logging museum, and the road quickly becomes dirt 
punctuated by sun-faded letter boxes, wonky fences and dusty driveways. 

But it is down one of these driveways that you'll find Sarah Davis, who 
has been running a scientific glass-blowing business since 2010. Work- 
ing from her garage, she provides local researchers — mostly university 
chemists in nearby Perth — with handmade flasks, tubes, condensers 
and bespoke items that don't even have a name. 

“If they want a simple condenser, I can whip that up in half an hour,’ 
says Davis, referring to the glass tube used to cool hot vapours. “I get 
people ringing up saying, ‘I’ve broken this, and generally I get it out for 
them the next day. For scientists who live in one of the world’s remotest 
cities, this makes Davis an extremely useful person to have around. The 
alternative is to wait at least six weeks for orders to be made and shipped 
from Sydney. “Sometimes she comes in after a couple of days and says, 
‘T’ve finished; and I say, ‘Already?’” says Grant Cope, who orders from 
Davis as part of his job as stores officer for the chemistry department at 
Curtin University in Perth. 

Many big research institutions have their own scientific glass-blow- 
ers — and that is what Davis was doing until five years ago, working 
as the in-house glass-blower on the University of Western Australia 


I | The West Australian town of Jarrahdale (population 1,082) seems an 
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(UWA) campus in Perth. But in 2010, when she was laid offin a round of 
university cutbacks, she decided to go it alone, putting her outbuildings 
into service as her workshops. 

“What could be more Australian than working in your garage and 
seeing a kangaroo come hopping down the drive?” she says. This is, in 
fact, routine. She also shares the garage with two possums that like to 
take naps in the rafters when the temperature creeps past 40°C, as it 
tends to do in summer. But in the worst of the weather, Davis is less likely 
to be toiling over hot glass: she is a volunteer firefighter and is regularly 

called up to deal with bush fires. 

The rustic setting belies the fact that Davis's 
craft is a highly technical practice, bearing very 
little resemblance to traditional glass-blowing. 
Fora start, there is not much blowing involved. 

She works with borosilicate glass, which 
unlike standard glass, can withstand tempera- 
tures of 300°C, as well as corrosive chemicals 

and high pressures. She heats and softens the glass over a gas flame, then 
uses a variety of tools to work it into shape. Perhaps most important is the 
glass-blowing lathe, with two spindles facing one another, both turning at 
precisely the same speed. On a day in December, with blowtorch in one 
hand and safety glasses firmly on (hot borosilicate glass gives offa danger- 
ously intense orange glare, not to mention lots of ultraviolet radiation), she 
carefully attaches a section of glass to the end of a long tube mounted on 
the lathe, rounding it off to create a test tube the size of her arm. She uses 
a similar process to make her flasks and other more specialized glassware. 

To finish off, Davis bakes her wares at 560 °C in an annealing oven, 

smoothing out stress points that could otherwise break the glass. With 
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diamond saws, tube cutters, lathe and oven, Davis estimates that her 
set-up is probably worth around half a million Australian dollars 
(US$400,000), and as a result she prefers to keep a low profile — even 
in a quiet town. “I don’t tend to have clients come and see me, I don’t 
have a web page; it’s all word of mouth and previous clients.” 

She has plenty of those, garnered over a 20-year career that started 
when, as a newly qualified lab technician in Perth, she landed a job that 
included a glass-blowing traineeship. Davis admits that she had never 
heard of scientific glass-blowing before that. George Koutsantonis, a 
chemist at the UWA, describes her components as “vital” for his research 
on pyrophoric chemicals, which ignite spontaneously if exposed to air. 
“Tt’s not the sort of thing you can buy off the shelf,” he says. Davis's 
strangest commission so far has been from some intrepid zoologists 
who asked her to make a glass funnel to hold over a dolphin’s blowhole 
in the hope of catching a sample for analysis. “I never got to see it in 
action,” she says. 

These kinds of weird and wonderful commissions are a lot rarer now. 
Thanks to financial pressures, only a handful of Australian universities 
still have an on-campus glass-blower — researchers have to order off- 
the-shelf glassware, and are less likely to request customized parts if they 
have to pay freelance glass-blowers out of tight budgets. 

Even counting those still plying their trade off-campus, there are only 
25 scientific glass-blowers left in Australia and New 
Zealand, says Davis. “There are just two of us in 
Western Australia that do it — the other guy is get- 
ting to retirement age. Hopefully I've got another 25 
years left in me, but the chance of training someone 
is probably not there. It's a dying art.” = 


> NATURE.COM 

For videos of the 
glass-blower and 
squid collector, see: 
go.nature.com/k5oule 


MATT DEVLIN 


DAVID STEPHENSON/CATERS NEWS 


The snake 
milker 


BY KELLY RAE CHI 


Harrison says, he has been bitten “only eight times”. And 

although he remembers each one vividly, tallying them up on 
his fingers can be tricky. An Indian cobra (Naja naja) mangled his 
right little finger 12 years ago, leaving it curled and increasingly 
sore until he had surgery to repair it. A bite from a desert horned 
viper (Cerastes cerastes) dissolved part of the bone in his left 
middle finger. Two other fingers, although functional, bear the 
scars of his profession. 

All this is par for the course when you nurture lethal snakes 
for science. Harrison and his wife, Kristen Wiley, run the Kentucky 
Reptile Zoo (KRZ) in Slade, which Harrison opened in 1990 asa 
research and education centre. It houses 1,600 snakes from more 
than 100 species, and it is one of just a handful of places around 
the world producing snake venom for biomedical research. 

Snake venoms contain a complex cocktail of enzymes and other 
substances that help to immobilize or digest prey, and which are 
of great interest to scientists. Drugs used to treat hypertension 
have been modelled on substances in venom that drastically 
lower the blood pressure of prey, for example. Other proteins in > 


[: nearly four decades collecting deadly snake venom, Jim 
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> venom have been used to identify and study specific signalling 
molecules in the nervous system. And venoms are needed to 
develop antivenoms. The KRZ sells about 1,400 grams of venom 
per year. 

Wiley and Harrison “provide a tremendous service, because 
most of us don’t have time to be zookeepers”, says Steven Aird 
at the Okinawa Institute of Science and Technology Graduate 
University in Japan, who has studied venom. “They really 
become not just suppliers, but almost collaborators in a sense.” 

Harrison’s fascination with snakes and other reptiles took hold 
when he caught a garter snake at the age of six. Throughout 
childhood, he read voraciously on reptiles and amphibians; at 
16, he worked on an alligator farm. 

Harrison started keeping venomous snakes as a hobby. He 
learned about venoms and extraction from books, including 
those written by Sherman Minton, a prominent herpetologist 
in Indianapolis with whom Harrison eventually became friends. 
Minton connected Harrison with others interested in venoms, 
and soon Harrison began to milk king cobras (Ophiophagus 
hannah) for university researchers. 

Harrison never believed that he could have a career involving 
snakes, so he became a police officer instead. But he continued 
extracting venom in a home laboratory equipped with a 
centrifuge to purify venom and a lyophilizer for freeze-drying 
it. At 26, after getting mown down by a stolen car while trying 
to make an arrest, Harrison’s heart stopped. He decided that 
policing was too dangerous, so he retired early and dedicated his 
career to snakes. Since then, snake bites have stopped Harrison’s 
heart three more times. 


DEADLY DISPLAY 

These days, Harrison and Wiley divide the work of running the 
reptile zoo. Wiley, who did an internship at the KRZ in 1998, 
manages the zoo’s educational programmes, reads the scientific 
literature and attends conferences to stay current on venoms 
and work out whether to breed a particular species that year. 

The actual milking falls to Harrison, who for liability reasons is 
the only staff member at the KRZ who does it. In front of a group of 
goggle-eyed schoolchildren, he demonstrates his technique on a 
monocled cobra (Naja kaouthia), a species that put him in hospital 
on life support after a bite in 2012. He pulls the 1.2-metre-long, 
dishwater-grey specimen onto a padded mat and pins its head 
down with the flat part of a long metal hook. 

Harrison grabs the cobra behind its head. As it reveals its fangs 
—a natural response to threat — Harrison plants them through 
a sheet of plastic film stretched across a funnel. He uses his 
thumb and a partially missing forefinger to massage the muscle 
supporting its venom glands. He will do this on between 600 and 
1,000 snakes per week. If everything goes as it should, he says, then 
milking snakes is methodical — “boring”, even. In fact, according 
to data that a physician friend gathered on him, Harrison’s heart 
beats faster when he is driving to the supermarket than when he 
is milking. 

Stephen Mackessy at the University of Northern Colorado in 
Greeley says that the KRZ’s reputation and knowledge of venoms 
sets it apart. Some companies provide repackaged venoms, but 
the provenance of these products, which can matter greatly in 
research, is uncertain at best, he says. Wiley says that much of this 
comes down to understanding the animals, which she and Harrison 
breed themselves, but also obtain from zoos and universities. “We 
attempt, as much as we can, to provide the locale and the origin 
information to the researcher,” says Wiley. 

Harrison says that the benefits for medical researchers — and for 
society — make him willing to take his daily calculated risks. “I! don’t 
plan on slowing down,” he says. “I will keep extracting until | die.” m 
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The squid 
collector 


BY ELIE DOLGIN 


nablustery morning in late October, the wind 
QO is blowing up quite a swell — enough to make 

this reporter heave his breakfast into the briny 
deep — but Bill Klimm is unperturbed. The 78-year- 
old fisherman sits calmly in his captain’s seat, arms 
folded, staring straight ahead at the choppy waters off 
the coast of Martha's Vineyard, Massachusetts, as his 
boat, the Gemma, travels southwest. 

Klimm and his co-captain, Dan Sullivan, are head- 
ing to Menemsha Bight in search of longfin inshore 
squid (Doryteuthis pealeii). These squid are prized 
for their giant nerve fibres, which allow biologists to 
study neurotransmission in exquisite detail. For the 
past 18 years, Klimm has been collecting these and 
other saltwater specimens for scientists at the Marine 
Biological Laboratory (MBL) in Woods Hole, Mas- 
sachusetts, and elsewhere around the world. 

From invertebrates such as sponges, worms, sea 
stars, urchins and anemones to several fish species 
and some plants, the creatures have a wide range of 
habits and dwelling places, but Klimm knows where 
to find them. And ifhe does not, he has a network of 
local fishermen that he can tap for advice. 

David Remsen, who manages the Marine 
Resources Department at the MBL, tells Klimm 
what to catch on the basis of the orders he receives 
from scientists. He says that a good specimen collec- 
tor needs intuition for the local seas and the skills to 
maintain the boats that navigate them. Klimm has it 
all. “He knows the waters, he knows the equipment, 
and he takes ownership of both,” says Remsen. 

Klimm’s knowledge of marine biology runs deep, 
too. “If you want to understand something about the 
squid life cycle, you will learn more in ten minutes 
talking to Bill than you will spending a week talk- 
ing to so-called experts,” says Joseph DeGiorgis, a 
squid neurobiologist at Providence College in Rhode 
Island, and an adjunct faculty member at the MBL. 


ALIFE AT SEA 

H. William Klimm III was born with Cape Cod 
fishing in his blood. His grandfather was a fisherman 
and lobsterman who owned a boatyard in Hyannis 
Harbor, Massachusetts. His father was a commercial 
fisherman operating out of Falmouth, who collected 
squid for the MBL as a sideline for 45 years — until 
the age of 88. 
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Klimm himself started fishing commer- 
cially when he was 23 years old. He caught cod, 
flounder, swordfish and lobsters for 30 years, 
until a boat fire cast him ashore in 1990. He 
fixed boats in Boston for five years before land- 
ing the MBL position. At the age of 60, after 
decades of long trips at sea, Klimm was finally 
home every night after work. “My wife calls it 
a toy job,’ he says. 

David Bodznick, who studies the neuro- 
biology of behaviour at Wesleyan University in 
Middletown, Connecticut, has spent summers 
at the MBL for more than 30 years, researching 
electrosensing in skate. “You could really tell 
when [Klimm] came on board that changes 
had been made,” he says. For example, Klimm 
installed new reels and altered the nets to mini- 
mize damage to the squid and other animals. 
“The whole operation became more efficient,’ 
says Bodznick. 

When Klimm and Sullivan reach their 


destination at Menemsha Bight, they drop a 
large net into the water, tow it along for 25 min- 
utes, pull in the line and sort through the catch 
— all without exchanging a word. “We do it so 
many times that we don't have to talk about it,” 
Klimm remarks afterwards. Large squid (those 
25 centimetres long 
re ay or more) go in one 
bucket; medium in 
another. Any small 
or damaged animals 
get tossed to the 
squawking seagulls 

overhead. 
A majority of the 
hundred or so squid 
collected today will be used to train neuro- 
surgeons attending a week-long teaching 
course at the MBL. Some will go to the nearby 
Woods Hole Oceanographic Institution, 
where scientists are investigating the effects 


FEATURE 


of ocean acidification on squid physiology, 
and 10-20 go to a visiting researcher at the 
MBL, Yuyu Song, who is studying how mis- 
folded proteins affect neurotransmission in 
the squid’s giant synapse. “The Gemma, her 
captain and the MBL collecting expeditions 
are all very dear to me since a big part of my 
research would have been impossible without 
them,” says Song, a neuroscientist normally 
based at the Yale School of Medicine in New 
Haven, Connecticut. 

Back in port, Klimm talks about what he 
does with his free time, gesturing across the 
dock to his “play boat’, the Sea Dog IV. That is 
where he and his wife can be found most week- 
ends in the summer, tooling around Martha's 
Vineyard and the Nantucket Sound. “For years 
and years and years I’ve done that — stepped 
from one boat to the other,” he says. “It’s kind 
of stupid, I suppose, but that’s what I do. That's 
what I do.” = 
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The data 
mechanic 


BY EWEN CALLAWAY 


en Dawn Johnson opens the doors into her workspace, 

\ N | the first thing you notice is the roar. The noise comes from 

whirring fans, which are required to cool the towering stacks 

of 16 computer servers that form walls of black and silver. Bundles of 

multicoloured cables, as thick as small trees, trail upwards like an elec- 
trical rainbow. 

“If anything goes wrong, I'll be the first port of call,” Johnson says, 
standing beside a toolbox the size of a shopping trolley. “Ill rip ‘em to 
bits and find out” 

Computational biologists the world over rely on Johnson to do that, 
even though most do not know her. That's because Johnson is a computer- 
hardware engineer at the European Bioinformatics Institute (EBI) in 
Hinxton, UK. The servers that she keeps running hold one of the world’s 
most extensive collections of molecular databases — from an archive of 
DNA-sequencing data to the leading repository of protein structures. The 
machines that she and her colleagues maintain hold a whopping 60,000 
terabytes of data, and people at around half a million unique Internet 
addresses use these data each month. A blip in availability is not an option. 
“Tt’s imperative that it’s there 24/7,” says Johnson. 

For Johnson, bearing the weight of the bioinformatics world on her 
shoulders was particularly burdensome late last year. Besides the centre 
in Hinxton, the EBI data had been spread across another two locations 
in London, but a contractor change meant that they had to move to a 
single location in a nearby town — and Johnson had to coordinate it. 
She and a small team of fellow engineers had to ensure that there was 
adequate space, power and cabling for the move, which involved roughly 
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9,500 computers connected by 850 power cables and 3,400 network 
cables. “The complexity of that is hair-raising,” she says, with a relaxed 
shrug. Still, the move went “incredibly well”, says Steven Newhouse, 
head of technical services at the EBI — with Johnson playing a crucial 
part in coordinating the logistics. The success meant, of course, that the 
researchers who rely on the EBI never so much as noticed. “Very few 
scientists appreciate the size of the computing infrastructure that they 
depend on nowadays,’ says Newhouse. 

When she is not at her desk dealing with the ins and outs of such 
projects, Johnson spends her time in the Hinxton data centre, which 
the EBI shares with the neighbouring Wellcome Trust Sanger Institute. 
Johnson and her colleagues install, maintain and repair the machines 
that feed the centres’ seemingly insatiable hunger for data storage — 
which is projected to reach 2 exabytes (2 x 10'* bytes, or 2 million tera- 
bytes) by 2016. There are occasional emergencies. Several years ago, a 
cooling-system failure forced Johnson to rush into work on a Saturday to 
keep the servers from overheating. She spent a stressful weekend getting 
the centre back online, so as to minimize the disruption to researchers. 

Computers were not the first machines that Johnson learned to rip 
apart. “My father’s a mechanic and an engineer, and so I was always in the 
garage with him fixing and tinkering with cars, and that was really what 
I wanted to do,’ Johnson says. “But it was 1979 when I left school. They 
just didn't hire lady mechanics.’ She went into secretarial work ata firm in 
Cambridge, UK, that sold and serviced computers for businesses. After a 
few years, her boss asked her what she wanted to try next, and she opted 
to work as a computer engineer. She was the only woman on the team. 

“When I was in the field, I was a novelty, I guess. But it was quite good. 
All the guys wanted to help me get on and succeed, and all the women 
saw me as sort of a stand for women’s lib and rights and stuff and were 
really on my side as well,’ she says. Even now, “I don’t meet many other 
women in my career, which is a shame”. 

Johnson's move into the bioinformatics world happened by chance. 
In the 1990s, she was doing contract work on mainframe computers at 
the Sanger Institute, which had a leading role in the Human Genome 
Project. She remembers a celebration to mark the completion of a draft 
human-genome sequence. “I saw that happening and thought I would 
like to be a part of that,” says Johnson. A hardware-engineer job opened 
up five years ago, and she jumped at the opportunity. “It’s great when I 
drive into work and hear people on the radio talking about the latest stud- 
ies, she says. “I'm very proud and lucky to be part of it.” = SEE EDITORIAL P.527 
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Rock art thought to be about 4,000 years old in Libya’s Tadrart Acacus mountains was vandalized in 2009. 


Save Libyan archaeology 


Until violence eases and fieldwork can resume, fund research in labs, 
museums and on computers, urges Savino di Lernia. 


human past. The Sahara, the largest 

hot desert in the world, was once green 
and hosted until a few thousand years ago 
the biggest freshwater lake on Earth’. Some 
depictions of crocodiles and cattle engraved 
and painted on the walls of rock shelters in 
the Sahara date back 9,000 years. 

The desert is also a laboratory for inves- 
tigating links between past climate changes 
and developments in human history”. 
These include the dispersal of modern 
humans across Africa about 130,000 years 
ago’, the oldest evidence’ of milking in 
Africa around 5200 Bc and the establishment 


| ibya is a hotspot for research into the 


of the first Saharan state® during the first 
millennium Bc. 

Archaeological fieldwork in Libya is at a 
standstill. Four years after the Arab Spring 
and the February 2011 Libyan revolution 
that ended the regime of Muammar Gaddafi, 
violence remains rife. Recent escalations 
in fighting have injured and killed people 
and damaged the nation’s cultural heritage, 
infrastructure and free press. Libyan monu- 
ments have been seriously damaged, includ- 
ing the Karamanli mosque, built in 1738 in 
the capital, Tripoli, and Islamic tombs that 
date to between the tenth and twelfth cen- 
turies at Zuwila, near the west-central town 


of Murzuq. This, along with concerns about 
the illicit trafficking of cultural materials, 
led Irina Bokova, the director-general of 
the United Nations Educational, Scientific 
and Cultural Organization (UNESCO), to 
call for greater protection of Libyan cultural 
heritage in November last year. 

The destruction of archaeological sites in 
Syria, Iraq and Afghanistan — to name but 
a few other war-torn countries — are part of 
the same picture: what does not comply with 
militant revolutionaries aims is expendable 
or must be destroyed. 

I have worked in Libya since 1990. My 
last field trip to the Messak plateau in the > 
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> southwest ended abruptly in February 
2011 with an emergency evacuation ona mil- 
itary aircraft. Before the revolution, I spent 
three months each year in the desert study- 
ing the prehistory of the Messak and nearby 
Tadrart Acacus mountains, which lie close 
to the border with Algeria, famous for their 
9,000-year-old rock art. Since then, scientific 
and cultural relations between Libya and the 
international community have stagnated. 
Archaeological tourism — a major source of 
revenue and jobs for locals such as the Tuareg 
and Tebu people, the two major Saharan 
ethnic groups in Libya — has stopped. 

Even though a discussion of cultural her- 
itage might seem out of place in a country 
devastated by civil war, I argue that scientific 
research in the region must not be aban- 
doned. As UNESCO recognizes, culture 
has a powerful role in “building social cohe- 
sion and contributing to reconciliation and 
peace”. We must continue to nurture skills, 
trust and knowledge about our shared past. 
Until fieldwork in the region becomes possi- 
ble again, archaeological grant agencies must 
fund studies of materials in museum collec- 
tions and encourage desk-based research in 
collaboration with Libyan scientists. 


MELTING POT 

Stretching from the Mediterranean Sea to 
the heart of the Sahara, Libya was a cross- 
roads for many ancient cultures, including 
the Phoenicians, Greeks and Romans. It 
hosts five UNESCO World Heritage Sites 
that illustrate the country’s historical diver- 
sity (see ‘Threatened heritage’): Cyrene, 
founded in about 630 Bc by the Greeks, 
was a principal town of the Hellenic world; 


THREATENED HERITAGE 


Violence and vandalism are destroying archaeological sites 
across Libya, from ancient cities to prehistoric rock art. 


UNESCO World Heritage Site o 


ha castle was hit by rockets in 2014. 


AD 


Se yrene 
Ben aZi) 


A massive haul 

of archaeological 
materials was stolen 
from a bank in 2011. 


Leptis Magna, once part of the Phoenician 
city-state of Carthage, was incorporated in 
46 Bc by the Romans into the province of 
Africa; Sabratha, west of modern Tripoli, 
was a Phoenician trading post that became 
an influential Roman town during the sec- 
ond and third centuries ap; the old town 
of Ghadames, known as the ‘pearl of the 
desert’ is noted for its outstanding tradi- 
tional architecture and was earlier home to 
the Romans and Berbers; and the Tadrart 
Acacus mountains are rich in prehistoric 
rock art. 

These fragile vestiges of the human 
past are vulnerable to natural and human 
threats. Harsh environmental conditions 
— temperature variations and wind ero- 
sion — take a toll on rock art and open-air 
archaeological sites in the desert. These 
sites are threatened by infrastructure devel- 
opment, reclamation of land for agricul- 
ture, exploitation of underground resources 
such as oil, water and gas, and vandalism. 
The same applies to sites in towns and vil- 
lages, such as the classical cities along the 
Mediterranean coast and the late-first- 
millennium Bc Garamantian cemetery in 
the ancient Wadi al-Ajal river valley near 
the town of Germa. 


DYING DISCIPLINE 
The Tadrart Acacus is a place of unbelievable 
beauty; it used to be a global tourist desti- 
nation. Some of its sites were vandalized in 
2009; further damage — mostly graffiti — 
has been reported. 

Today, the site is inaccessible: no com- 
mercial flight connects Tripoli and Ghat, 
a nearby town (a weekly military aircraft 
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The ancient Greek city of Cyrene is at risk because of nearby construction. 


brings food, essential goods and first-aid 
equipment). The tarred road between Ghat 
and Ubari is broken up, and clashes between 
the Tebu and Tuareg tribes increasingly 
affect the area. 

Perhaps the greatest threat to Libya's 
diverse heritage is the trafficking of archaeo- 
logical materials, for profit or to fund radical 
groups. This has already been documented 
in Syria and Iraq’. No one has been able to 
fully assess the situation in Libya. Going to 
work among the black smoke of grenades, 
the men and women of the Libyan Depart- 
ment of Antiquities are doing their best. But 
museums are closed and the little activity left 
in the field is limited to the north. 


LOST OPPORTUNITY 

The Gaddafi regime neglected Libyan 
prehistory and relegated it to folklore. Clas- 
sical towns such as Sabratha, Cyrene and 
Leptis Magna were viewed negatively as 
links to a colonialist past. 

Among the hopes sparked by the revolu- 
tion was the idea of a more modern view of 
the archaeological and cultural heritage — 
as a gateway to a shared national identity, a 
major revenue source and a focus for forg- 
ing relationships with the rest of the world. 
Those hopes have been dashed. 

The international community took actions 
to safeguard Libyan heritage ata UNESCO 
meeting in Paris at the end of October 2011, 
while the revolutionaries were still fight- 
ing. I learned, in a small Parisian café, that 
Gaddafi had been killed. The next day, hopes 
for Libya’s future filled the meeting room. 
Participants unanimously decided to build 
a shared programme for the training of 
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young Libyan archaeologists and scientists 
in Cyrenaica — the region where the revolu- 
tion started — and Shahhat (modern Cyrene). 

Partnerships between the Libyan govern- 
ment and UNESCO, with funding from Italy, 
were forged to strengthen Libya’s research 
and stewardship capacity in archaeology. 
Particular attention was given to building 
the national archive and training Libyan 
police, customs officers and workers in the 
antiquities department in the fight against 
the trafficking of cultural property. 

But few initiatives reached the field and 
now all efforts have essentially stopped. The 
escalation of hostilities in the past six months 
has led many foreign embassies to close and 
international organi- 


zations to move to “Museum 
neighbouring Tunisia. collections 
Large parts of North shouldbe 
Africa are cut offby digitized and 
the civil warin Libya, made freely 
the Tuareg rebellion qyqilable 

in Niger, insecurity jgq global 


in Chad, Algeria and 
Mali, the presence of 
al-Qaeda and ISIS militants, and a large and 
uncontrolled circulation of weapons. Inter- 
national granting bodies such as universities 
and European and US research agencies have 
ceased to support field expeditions. 

Being a Saharan archaeologist today is 
a difficult job. Researchers fear being kid- 
napped or even killed. Insurance cover 
is hard to come by. We could transfer our 
activities to another place, but this would 
mean abandoning decades of ideas, invest- 
ment, and relations built with friends and 
colleagues. It would be hypocritical and sad. 


audience.” 


Istrongly believe that scientific cooperation 
is an effective way to bring people closer, 
increase confidence and make cultures more 
open to one other. 


REKINDLE RESEARCH 

Fieldwork is vital to research and central 
to fundraising in archaeology. But in Libya 
— and other violence-wracked countries — 
archaeology as we have practised it has come 
to an end. Lengthy excavation campaigns 
will be impossible for years, if not genera- 
tions. Researchers must imagine a different 
future based on other methods. 

International funding and attention 
must return to scientific studies of Libyan 
heritage. Research should focus on exist- 
ing materials in museums and collections. 
Granting bodies should give greater priority 
to research that can be carried out on com- 
puters or in the laboratory. Sample analyses 
of archaeological materials can be done in 
international labs, where Libyan scientists 
should work and be trained. 

Building an online library of rock-art 
sites, with the involvement of Libyan stu- 
dents and colleagues from other countries, 
would help Libyan scientists to overcome 
their isolation and regain a sense of iden- 
tity. Museum collections that span from 
remote prehistory to the Islamic cultures 
should be digitized and made freely avail- 
able to a global audience. Unpublished col- 
lections held by international teams should 
also be digitized and shared online. Remote 
analyses of satellite imagery, for example, 
has been used to reveal lost Saharan cities 
(see go.nature.com/8ylgxh). 

International cooperation between local 


and foreign groups working in Libya must be 
supported. Travel funding and visas for Lib- 
yan scientists to work temporarily overseas 
should be found. And mobility programmes 
for scientists such as the European Union’s 
Erasmus Mundus should be exploited — 
Libya’s application numbers have been his- 
torically low. Energy companies and others 
with commercial interests in Libya should be 
encouraged to work with local stakeholders 
to help to train local personnel in scientific 
research. 

Without these steps, archaeological 
research in Libya, already moribund, will 
soon die. It would be gravely disappoint- 
ing and paradoxical if after years of neglect 
under the Gaddafi regime Libyan archaeo- 
logical heritage is once again be abandoned. 
As well as a failure of the 2011 revolution, 
it would be a missed opportunity for a 
generation of young Libyan archaeolo- 
gists — and a tragedy for the safeguarding 
of monuments and sites of universal and 
outstanding value. m 


Savino di Lernia is director of The 
Archaeological Mission in the Sahara, 
Sapienza University of Rome, Italy. 
e-mail: savino.dilernia@uniroma1.it 
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AUS soldier assists a member of his unit during the Battle of Okinawa in the Second World War. 


SOCIOBIOLOGY 


Altruists together 


Herbert Gintis applauds two books that powerfully 
enrich the dialogue on behavioural science. 


re humans basically selfish yet 
Aiereese by society into curbing 

their instincts? Or are they basically 
altruistic but corrupted by unjust societies? 
These age-old questions are now asked by 
behavioural scientists and discussed in jour- 
nals such as Nature. Evolutionary biologist 
David Sloan Wilson's Does Altruism Exist? 
and science historian Michael Shermer’s The 
Moral Arc are brilliant contributions to this 
branch of sociopolitical discourse. 

Applying scientific principles to human 
society is hard. Society is a complex dynami- 
cal, adaptive nonlinear system. Moreover, 
rapid technical change, increased population 
density and globalization mean that we can- 
not reliably predict the future from the past. 
Even human nature, forged tens of thousands 
of years ago, turns out to be stunningly plastic. 

Wilson's question is: do actions that mainly 
benefit unrelated others at personal cost exist? 
Could anyone doubt it? We give to charity, 


Does Altruism Exist?: Culture, Genes, and 
the Welfare of Others 

DAVID SLOAN WILSON 

Yale University Press: 2015. 


The Moral Arc: How Science and Reason 
Lead Humanity toward Truth, Justice, and 
Freedom 

MICHAEL SHERMER 

Henry Holt: 2015. 


vote for public education even when we have 
no children, and volunteer to fight and die in 
war. People conform to social norms even 
when no one is looking, and punish the anti- 
social behaviour of others even when it is 
costly to do so. Yet for decades, a countervail- 
ing theory has held in biology and economics. 

Richard Dawkins, in The Selfish Gene 
(Oxford University Press, 1976), reflected 
the opinion then current among biologists: 
“Let us try to teach generosity and altruism, 
because we are born selfish” Some 35 years 
later, in Nature, 137 evolutionary biologists 
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petitioned that “natural selection leads organ- 
isms to become adapted as if to maximize 
their inclusive fitness” (P. Abbot et al. Nature 
471, E1-E4; 2011) — even in the most highly 
social species, individuals primarily help rela- 
tives. In fact, inclusive-fitness maximization 
is a pious wish of many population biologists 
that has never been validated in theory or fact. 

Wilson's basic principle is the group-selec- 
tion credo: “Selfishness beats altruism within 
groups. Altruistic groups beat selfish groups. 
Everything else is commentary” (D.S. Wilson 
and E. O. Wilson Q. Rev. Biol. 82, 327-348; 
2007). As Charles Darwin noted in The 
Descent of Man (Murray, 1871), a hunter- 
gatherer band with many brave, altruistic 
soldiers will triumph over a group made up 
mostly of selfish cowards, even though the 
best thing of all for an individual is to bea 
coward surrounded by brave compatriots. 
The mathematics supports this scenario. 

It is fashionable to question this view, but 
the theoretical issues have been resolved for 
decades. Groups do not mate or produce off- 
spring, and so do not have biological fitness. 
Rather, the social organization of a species, 
its mating patterns and social groupings, is 
inscribed in the genomes of species members. 
Groups with successful social organization 
tend to enhance the fitness of their members, 
whose genomes code for this organization. 
Altruism can evolve in such groups, provided 
that altruists tend to be grouped preferentially 
with other altruists, in which case their bio- 
logical fitness can on average be at least as 
high as that of selfish types. 

As Wilson shows, another important 
source of human success is that cultures stress 
cooperation within the group, and so pun- 
ish antagonistic individuals. This has led to 
humans ‘domesticating themselves, favouring 
a human nature that is relatively docile and 
dependent on the company and approval of 
others. Moreover, humans have evolved to 
coordinate their behaviour, each member of 
a team ‘reading the minds’ of the others and 
identifying with common goals (see Michael 
Tomasello’s A Natural History of Human 
Thinking, Harvard University Press, 2014). 

Shermer’s The Moral Arc, although 
grounded in behavioural game theory and 
social psychology, is the more speculative 
book. He offers a defence of science and rea- 
son as emancipatory tools in the face of big- 
otry, pseudoscience and faith. He, too, argues 
that humans are basically moral and coopera- 
tive, but adds that they are parochial. When 
their community is threatened, people turn 
compassion for kin into hatred for outsiders. 

Shermer’s central point is that even evil 
people are generally motivated by their own 
particular morals. In the perpetrators’ minds, 
violence against outsiders is the application 
of justice. This requires that the enemy be 
deemed inferior and the cause of problems 
— an excuse historically manipulated by 
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Machiavellian leaders to gather support for 
their ambitions, as when the Nazis blamed the 
Jewish people for Germany’s economic woes. 

This is where science, technology and 
reason come into play, Shermer argues: the 
growth of global information and communi- 
cations networks has rendered it increasingly 
difficult to perpetrate the falsehoods that let 
authoritarian leaders maintain their rule. For 
Shermer, an increasingly educated popu- 
lace with access to information undermines 
parochialism and pseudoscience, by allowing 
people to judge for themselves. The role of 
smartphones and social media in fuelling the 
2011 Arab Spring uprisings is a case in point. 

This is a welcome turnaround from The 
Believing Brain (Times, 2011), in which Sher- 
mer argued the rather nihilistic position that 
“beliefs come first, explanations for beliefs 
follow”. In The Moral Arc, Shermer, founder 
of the Skeptics Society, adheres to Enlighten- 
ment thought. His subtitle, How Science and 
Reason Lead Humanity Toward Truth, Justice, 
and Freedom, evokes the call to arms of phil- 
osopher Immanuel Kant in his 1784 What 
is Enlightenment?: “Have the courage to use 
your own understanding” 

Some of Shermer’s positions would have 
surprised Enlightenment writers. Kant, for 
instance, believed that the oppressive state 
and authoritarian church were the sole 
impediments to truth and justice. We know 
now that even people with access to the ballot 
box and free expression can embrace intol- 
erant and obscurantist doctrines. Moreover, 
Voltaire and others believed that the unedu- 
cated could not apply reason to the affairs of 
life. Shermer, by contrast, is a vigorous propo- 
nent of political democracy and equal rights. 

Shermer’s is an exciting vision, but he is 
mistaken in thinking that truth, freedom 
and justice are the inevitable by-products of 
scientific advance. Modern liberal democracy 
is the product of masses of people collectively 
throwing off the yoke of authoritarian states. 
But the power of popular action was made 
possible by a military technology: the hand- 
gun. This displaced elite cavalry and required 
nations to give the vote to peasants and citi- 
zens, who became the lifeblood of military 
defence. Even today, the United States, with its 
formidable drones and missiles, cannot win a 
war without ‘troops on the ground. 

We must be on constant guard against new 
instruments of information control, persecu- 
tion and death that could once again render 
secular and religious totalitarianism a viable 
social alternative. Constant vigilance by 
altruists such as Wilson and rationalists such 
as Shermer may in the end win the day. = 


Herbert Gintis is external professor at the 
Santa Fe Institute in New Mexico. His most 
recent book is A Cooperative Species (with 
Samuel Bowles). 

e-mail: hgintis@comcast.net 


Books in brief 


Most Wanted Particle: The Inside Story of the Hunt for the Higgs, 
the Heart of the Future of Physics 

Jon Butterworth EXPERIMENT (2015) 

The Higgs boson may seem amply biographized, but Jon 
Butterworth’s account of its 2012 discovery offers deep context. As 
a physicist on the ATLAS experiment at CERN — the Higgs hunting 
ground near Geneva, Switzerland — Butterworth is an insider’s 
insider. His narrative seethes with insights on the project’s science, 
technology and ‘tribes’, as well as his personal (and often amusing) 
journey as a frontier physicist. Glossaries on the standard model of 
physics, Feynman diagrams and more are included. 


Sea of Storms: A History of Hurricanes in the Greater Caribbean 
from Columbus to Katrina 

Stuart B. Schwartz PRINCETON UNIVERSITY PRESS (2015) 

Ten years ago, Hurricane Katrina killed more than 1,800 people 
and submerged 80% of New Orleans, Louisiana. Historian Stuart 
Schwartz frames that catastrophe within five centuries of hurricanes 
in the greater Caribbean — natural disasters that mirrored and 
exacerbated the violent social upheavals that erupted as European 
nations pursued New World riches. Today, a mix of political 
vagaries and patchy official disaster response presents dangerous 
ambiguities in a region where more cyclones are a certainty. 


Touch: The Science of Hand, Heart, and Mind 

David J. Linden VIKING ADULT (2015) 

A touching story? A tactless comment? So elemental is the sense 

of touch that it permeates metaphors we live by. In this succinct 
treatise, neuroscientist David Linden explores the “weird, complex, 
and often counter-intuitive” tactile system and its intimate impact 
on the human experience. Through scores of scientific studies and 
anecdotes, Linden investigates phenomena ranging from the two 
separate touch systems in the skin (one slow, one fast), to a detailed 
‘cast list’ for the main neurophysiological players in orgasm, such as 
the somatosensory cortex, amygdala and cerebellar nuclei. 


Melting Away: A Ten-Year Journey through Our Endangered 
Polar Regions 

Camille Seaman PRINCETON ARCHITECTURAL PRESS (2014) 

In the space of a generation, Antarctica and the Arctic have 
metamorphosed from remote frontiers to cruise destinations, 
their icy reaches and charismatic wildlife exhaustively mapped 
and filmed. But writer and photographer Camille Seaman (see 

J. Hoffman Nature 492, 40; 2012) has a rare gift for making them 
seem arrestingly alien again. Her coffee-table book is the product of 
ten years at the poles; its images alone are a compelling argument 
for protecting the wonder and strangeness at the ends of the Earth. 


Fantasy Islands: Chinese Dreams and Ecological Fears in an Age of 
Climate Crisis 

Julie Sze UNIVERSITY OF CALIFORNIA PRESS (2015) 

Carbon-neutral, zero-waste and home to 500,000 people: the Chinese 
eco-city of Dongtan seemed a radical urban dream. But the city, to 

be sited near Shanghai on Chongming — the world’s biggest alluvial 
island — remains a blueprint. As Julie Sze argues in this thoughtful, 

if uneven, analysis of Chinese “eco-desire”, the culprit could be 
irreconcilable beliefs in harmony with nature, and the ability of 
autocratic political structures to enact radical change. Barbara Kiser 
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Correspondence 


In praise of Holt as 
head of the AAAS 


The American Association for the 
Advancement of Science (AAAS) 
has chosen Rush Holt, who was 

a Democratic congressman 

for eight terms, as its new chief 
executive. Daniel Sarewitz 

attacks this choice as “political” 
(Nature 516, 9; 2014), but it is not 
partisan. 

The AAAS announcement 
praises Holt for broadly 
promoting “the value of science 
communication, particularly for 
conveying information about 
climate change” In its March 2014 
Climate Science Panel report, the 
AAAS talked bluntly about the 
dangers of inaction and of poor 
science communication — a view 
you share in calling on scientists 
to ensure that “they are not bested 
in the court of public opinion” 
(Nature 464, 141; 2010). 

Sarewitz contends that the 
AAAS is “anointing a leader who 
could take up the fight” with 
climate-science deniers, among 
whom are many Republican 
politicians. But in Holt, the AAAS 
has a scientist who understands 
the fight that we are in (see also 
Nature 471, 265-266; 2011) and 
is well placed to defend it from 
attacks by Congress. He is an 
inspired choice. 

Joseph Romm Center for American 
Progress, Washington DC, USA. 
jromm@americanprogress.org 


Shale gas: nuance in 
output predictions 


You claim that the most recent 
estimates of future output for 
shale gas in the United States have 
become more conservative, but 
in our view this is a red herring 
(Nature 516, 28-30; 2014). 
State-of-the-art projections 
for the world’s future shale-gas 
supplies hinge on improved 
quantification of the uncertainty 
range and reducing its spread 
(called ‘de-risking’ a shale play) 
as experience and technology 
advance. Several of the studies 
you quote include uncertainty 


ranges that explain the current 
spread in forward projections 

of future US gas supplies. That 
crucial nuance was missing from 
your graphic, however, which 
shows only a simplified, discrete 
forward-production prognosis. 

Comparing just one scenario 
from the study by the team at 
the University of Texas at Austin 
with another from the US Energy 
Information Administration's 
shale-gas outlook, omitting 
uncertainty ranges, creates an 
apparent mismatch where one 
may not in fact exist. 

As a result of technology 
innovation (see also Nature 516, 
7; 2014), the United States is 
today drilling 3-kilometre-long 
horizontal wells and conducting 
30-stage fracture treatments 
at depths of 3.7 km. Further 
technological gains will increase 
global oil and gas output (see, for 
example, S. Neff and M. Coleman 
Energy Strategy Rev. 5, 6-13; 
2014). Oil and gas prices also 
drive global shale development. 

No one can accurately 
predict both the technology 
improvement rate and future 
wellhead prices, so we have to 
rely on a range of forecasts based 
ona variety of assumptions. 
Steve Holditch, Dan Hill, 
Ruud Weijermars Texas AM 
University, College Station, USA. 
r.weijermars@pe.tamu.edu 


Shale gas: hardly a 
fallacy 


We believe that your comparison 
of US fracking forecasts creates 

a false dichotomy between 
modelling results from the 

US Energy Information 
Administration (EIA) and the 
Bureau of Economic Geology at 
the University of Texas at Austin 
(Nature 516, 28-30; 2014). 

Our integrated team of 
scientists, engineers and 
economists at the University of 
Texas has built rigorous models 
that incorporate a wide range 
of input variables and well- 
constrained outcome scenarios. 
In our view, the comparison of 


just one simulation run with 

a single outlook from the EIA 
trivializes a complex problem and 
fails to represent accurately the 
rigour and uniqueness of what is 
being accomplished in our four- 
year study (see go.nature.com/ 
zfverj). 

Your graphic ‘Battle of the 
forecasts’ is partially attributed 
to our data. Although we present 
preliminary results at conferences 
and make them available on 
our website, we explained to 
the author that our work on 
the Haynesville and Marcellus 
plays was not yet finished or 
published, and requested that it 
should not be used. We therefore 
question why you should choose 
to base the main thread of your 
argument on a comparison to our 
unfinished work. 

Finally, I find your headline 
‘The fracking fallacy’ potentially 
misleading: in isolation, it reads 
as a negative comment on the 
fracking process itself, rather 
than on forecasts of natural gas 
production. Production of US oil 
is currently at a 30-year high, and 
of natural gas at an all-time high. 
Hydraulically fractured wells 
account for more than halfand 
almost half, respectively, of US 
natural gas and oil production. To 
imply otherwise does a disservice 
to your readers. 

Scott W. Tinker, Svetlana 
Ikonnikova The University of 
Texas at Austin, Texas, USA. 
scott.tinker@beg.utexas.edu 


Editorial note: Scott Tinker and 
Svetlana Ikonnikova informed 
Nature that their study was 
unpublished. They subsequently 
made the data publicly available, 
at which point Nature used 

that information and gave 
appropriate credit. 


United Nations 
highlights soil crisis 


Some 500 years after Leonardo 
da Vinci declared that more was 
known about celestial bodies 
than about the soil underfoot, the 
United Nations has proclaimed 


2015 the International Year 

of Soils. This offers a unique 
opportunity to address the crisis 
in soil sustainability (see www. 
fao.org/soils-2015). 

Among the factors 
undermining soil quality are 
intensive farming, industrial 
activity and increasing 
urbanization. Soil contamination 
is threatening food production, 
water potability and ecosystem 
services, notably in large parts 
of China. Safeguarding soils is 
therefore crucial to the UN Post- 
2015 Development Agenda and 
the Sustainable Development 
Goals (go.nature.com/s7jcik). 

Initiatives that are already 
under way for the sustainable 
management of complex 
soil systems include the 
Intergovernmental Technical 
Panel on Soils, the Global Soil 
Biodiversity Initiative, the 
Global Network of Critical 
Zone Observatories, and the 
International Soil Modeling 
Consortium. 

Henry Lin Institute of Earth 
Environment, Chinese Academy 
of Sciences, Xian, China; and 
Pennsylvania State University, 
University Park, USA. 

Rainer Horn Christian Albrechts 
University zu Kiel, Germany. 
henrylin@psu.edu 


CORRECTIONS 

The ‘West Asia’ article in the 
Nature Index (Nature 515, S88- 
S89; 2014) stated that King 
Abdullah University of Science 
and Technology had an article 
count of 121 and a weighted 
fractional count of 9.96. In fact, 
it was King Abdulaziz University 
that had these values. 

In the Nature Index China, 
the ‘Chinese Academy of 
Sciences’ article (Nature 516, 
$56-S57; 2014) should have 
affiliated Peng Zhang to the 
Institute of Plant Physiology 
and Ecology. And in the 
‘Beijing’ article (Nature 516, 
S$60-S61; 2014), Ning Jiao’s 
quote was mistranslated, so it 
has been updated. 


29 JANUARY 2015 | VOL 517 | NATURE | 553 
© 2015 Macmillan Publishers Limited. All rights reserved 


OBITUARY 


Donald Metcalf 


(1929-2014) 


Discoverer of hormones that regulate blood-cell proliferation. 


onald Metcalf established which 
D blood cells give rise to which, 

and identified the hormones 
that regulate the cells’ proliferation and 
differentiation. His work, which shed 
light on how to boost people’s sup- 
plies of white blood cells, has benefited 
millions. 

Metcalf — Don to nearly every- 
one who worked with him — died on 
15 December 2014. He was born in 1929 
in Mittagong in the Southern Highlands 
of New South Wales, Australia, to school- 
teachers. During his medical degree 
at the University of Sydney, Metcalf’s 
passion for research was ignited when 
he spent a year studying the ectromelia 
virus, which causes ‘mousepox. 

Metcalf received his degree in 1953 and 
moved to Melbourne in 1954 to join the 
Walter and Eliza Hall Institute of Medi- 
cal Research (WEHI). Aside from brief 
trips to Europe and the United States, 
he spent his 60-year career at the insti- 
tute, supported throughout by the Carden 
Fellowship of the Anti-Cancer Council of 
Victoria (now the Cancer Council). 

Metcalf’s arrival at the WEHI was not 
smooth. Its then director, Frank Macfarlane 
Burnet, who was later awarded the Nobel 
Prize in Physiology or Medicine for his 
work on immunity, was not a fan of cancer 
research. According to Metcalf, Burnet, like 
many others at the time, viewed cancer as 
“an inevitable disease’, and cancer research- 
ers as either “rogues or fools”. Metcalf was 
undeterred. 

For ten years he worked on cell turnover 
in an immune organ called the thymus, until 
a chance finding changed his focus. Metcalf, 
with Ray Bradley of the University of Mel- 
bourne, discovered that he could grow colo- 
nies of blood cells in agar, provided that the 
right stimulus was added. 

Metcalf’s genius lay in realizing that he 
could use this system to work out how cell 
types are related, and also to characterize 
the hormones that regulate the cells’ pro- 
liferation and differentiation. He named 
these hormones colony-stimulating fac- 
tors (CSFs). Over the next 50 years, Metcalf 
made the blood-cell system the model for 
understanding the regulation of cell growth 
in body tissue. 

Early on, Metcalf realized that to purify 
CSFs, and to clone the genes that encoded 
them, he would need collaborators. He 


recruited young faculty members with 
the required biochemical and biophysi- 
cal skills to work with him at the WEHI. 
The group later collaborated with several 
molecular biologists who came to work at 
the Melbourne branch of the international 
Ludwig Institute for Cancer Research. 
The branch was directed by Tony Burgess, 
previously a laboratory head in Metcalf’s 
WEHI Cancer Research Unit. 

From 1965 to 1985, Metcalf and 
his team identified and purified four 
colony-stimulating factors: granulocyte- 
macrophage CSF (GM-CSF), granulocyte 
CSF (G-CSF), macrophage CSF (M-CSF) 
and multi-CSEF, now known as interleukin-3. 
The team also cloned the gene for one of 
these, GM-CSF; the other genes were cloned 
by groups around the world. 

This work paved the way for mass 
production of the hormones and the experi- 
ment that Metcalf had long dreamed of: 
injecting CSFs into animals. His niggling 
doubt was that the factors — purified using 
a contrived in vitro assay — might be irrel- 
evant to normal physiology. He needn't have 
worried. In mice, the CSFs triggered a spec- 
tacular rise in the number of white blood cells 
in the bone marrow and peripheral blood. 

Clinical applications soon followed. The 
most widespread use of CSFs has been to 
ameliorate leukopenia, a decline in the 
number of white blood cells associated with 


554 | NATURE | VOL 517 | 29 JANUARY 2015 


© 2015 Macmillan Publishers Limited. All rights reserved 


chemotherapy. During phase I clinical 
trials of G-CSF conducted in the late 
1980s, Metcalf and his collaborators 
noticed that administering the hormone 
prompted large numbers of haemopoietic 
stem cells — precursor cells that give 
rise to all types of blood cell — to move 
from a person's bone marrow into their 
peripheral blood. 

The finding allowed clinicians to 
harvest stem cells simply by injecting 
people with G-CSF and taking their 
blood, rather than by extracting the cells 
from bone marrow — a more painful 
and complicated procedure. The new 
method made the transplantation of 
blood stem cells safer, easier, more effec- 
tive and ultimately more widely used. In 
the past 20 years, 20 million people are 
thought to have benefitted from Metcalf’s 
discoveries. 

None of the many prizes that Don 
received conveys the degree to which he 
was a scientist’s scientist. He distrusted 
researchers who had turned their back on 
the bench; he always worked in the labora- 
tory, assisted by one or two research assis- 
tants and an occasional graduate student or 
postdoctoral fellow. He had an incredible 
work ethic. Having worked in the lab for 
eight or nine hours, Don would write papers 
or books at home. He also detested spin. His 
inclination was to produce one paragraph of 
discussion for each page of results — not a 
word more. 

The only thing that Don valued more 
than his science was his family — Jo, his wife 
of more than 60 years, his four daughters 
and six grandchildren. Last August, when 
Don was diagnosed with incurable meta- 
static pancreatic cancer, he faced a dilemma: 
how could he continue to do experiments 
and spend as much time as possible with 
his beloved Jo? He found a solution: he had 
his microscope moved to his dining-room 
table. Don continued to work, surrounded 
by his loved ones, until early November — 
exactly as he wanted it. m 


Douglas Hilton is director of the Walter 
and Eliza Hall Institute of Medical Research 
in Melbourne, Australia, and head of the 
Department of Medical Biology at the 
University of Melbourne. He first worked 
with Don Metcalf as an undergraduate 

in the 1980s. 

e-mail: hilton@wehi.edu.au 
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Emergency back-up for lung repair 


Influenza virus severely damages the epithelial tissue that lines the lung. Findings suggest that, in mice, activation ofa 
back-up population of stem cells mediates effective repair of the injured lung. SEE LETTERS P.616 & P.621 


EMMA L. RAWLINS 


he lung has low rates of day-to-day cell 
Tiesmote but a tremendous potential 

for repair following injury. Despite 
this regenerative ability, the existence of 
dedicated lung stem cells has been hotly con- 
tested’, and the characteristics of any stem-cell 
population that may reside in this tissue are 
unknown. In two papers in this issue, Vaughan 
et al.” (page 621) and Zuo et al.’ (page 616) 
show that when the epithelial cells lining the 
interior of the lung are damaged by infection 
with influenza virus, a rare stem-cell popula- 
tion is induced to proliferate and migrate to 
the damaged site. There, this population can 
differentiate into several cell types. 

Over the past five years, it has become clear 
that many types of the differentiated secre- 
tory epithelial cells in the lung can function 
as stem cells at steady state and following mild 
injury*°. But what happens in response to 
severe injury is less clear. Infecting mice with 
influenza is a useful way to severely injure lung 
cells, both to investigate cell behaviour and 
to model the human disease’. In response to 
influenza, a cell population seems to migrate 
from the airways into the alveoli, where they 
remain’. However, the identity of these cells 
and their specific contribution to alveolar 
repair have been unknown. 

The current papers address this issue using 
lineage tracing in mice, in which a genetic trick 
indelibly marks a cell population of interest 
and all of its descendants. Vaughan and col- 
leagues traced populations of differentiated 
secretory epithelial cells, which maintain 
normal lungs and repair minor injuries*° 
(Fig. 1a), and showed that these cells are not 
involved in influenza-induced repair. Impor- 
tantly, they also showed that spurious results 
could be obtained if care was not taken with 
the lineage-tracing technique. Both groups 
defined a previously unnoticed cell popula- 
tion containing stem cells that mediate repair 
following influenza. 

The stem cells, which are located in the 
airways, were dubbed distal airway stem cells 
by Zuo et al., and lineage-negative progeni- 
tors by Vaughan and co-workers. They are 
rare, undifferentiated, basally located cells 
— that is, their top surface does not extend 


a Airway cell 


Krt5-expressing 


stem cells 


Figure 1 | Stem cells that mediate lung repair. a, The airways and alveoli of the lung are lined with 
distinctive epithelial cells. Vaughan et al.” and Zuo et al. identify a population of Krt5-expressing stem 
cells that are rare and inactive under steady-state or mild-injury conditions. b, Krt5-expressing stem cells 
are activated in response to the severe lung damage caused by influenza infection. These cells proliferate 
and migrate into the alveoli. c, Zuo and colleagues report that these cells differentiate into both airway 
and alveolar lineages, resulting in productive lung repair. By contrast, in the mice analysed by Vaughan 
and co-workers, the cells differentiated into airway lineages, but remained undifferentiated in the alveolar 
epithelium until experimental inhibition of the Notch signalling pathway allowed differentiation to occur. 


up to the air space of the airway. The cells 
express the cytokeratin 5 (Krt5) protein and 
the transcription factor Trp63, either singly or 
together. These molecules are widely expressed 
in basally located, dedicated stem cells in other 
epithelial tissues’. 

Both groups traced a Krt5-expressing subset 
of the cells and showed that these cells move 
into the alveoli following injury (Fig. 1b). 
Furthermore, Vaughan et al. used time-lapse 
microscopy to analyse lung slices through 
live-cell imaging. They found that the Krt5- 
expressing stem cells can migrate through 
alveolar walls, indicating that epithelial cells 
can migrate farther to contribute to repair than 
previously thought. By contrast, Zuo and col- 
leagues killed the cells at the beginning of the 
migration process. This resulted in failure of 
alveolar repair, thus demonstrating that these 
stem cells are necessary for effective repair. 

Despite this broad agreement between the 
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two studies, they differ in interesting ways. 
The groups traced Krt5-expressing stem cells 
and their descendants using mice that carried 
slightly different genetic modifications*”. Zuo 
et al. demonstrated that the cells they traced dif- 
ferentiate into mature alveolar cells (Fig. 1c). By 
contrast, the cells traced by Vaughan et al. move 
into the alveoli and remain there after infection, 
but never differentiate into mature cells. 

How do we reconcile these differences? One 
possibility is a discrepancy in the extent of 
injury. The route by which each group infected 
their mice was different (intratracheal or intra- 
nasal). There may also have been small differ- 
ences in virus levels, or in the genetic make-up 
of the mice used. To avoid confusion in the 
future, influenza models should be standard- 
ized, or the extent of injury characterized in 
each experiment. Alternatively, the two labora- 
tories may have been sampling slightly differ- 
ent subsets of cells from the same population. 


JIM THOMMES. 


It remains unclear how many stem-cell 
populations there are in the lung, although 
the Krt5-expressing population characterized 
in these studies seems to reconcile previous, 
apparently disparate, findings”’®. But how 
many cell types are included in this popula- 
tion? Vaughan and colleagues analysed the 
transcriptional profiles of individual stem cells, 
and provided preliminary evidence that the 
population contains many different cell types 
(itis heterogeneous). However, this may reflect 
current limitations in techniques for isolating 
pure populations of these cells. A comprehen- 
sive answer to the question will require in vivo 
experiments in which individual stem cells are 
lineage-traced, coupled with further single-cell 
molecular analysis. Such experiments would 
determine whether the cells constitute a single 
population, and so the different results reflect 
different experimental conditions, or whether 
the population is truly heterogeneous and per- 
haps contains cells with different capacities for 
migration or differentiation. 

Nevertheless, the differences between the 
two studies mean that each provides distinct 
insights that could be relevant to human health. 
Vaughan and co-workers found that chemical 
inhibition of the Notch signalling pathway 
was required for their cells to differentiate into 
mature alveolar cells (Fig. 1c). Moreover, they 
identified regions of hyperactive Notch signal- 
ling in human lungs that had defective alveolar 
repair, implicating this pathway in the develop- 
ment of chronic lung conditions. By contrast, 
Zuo et al. reported that Krt5-expressing stem 
cells could be grown in culture and transplanted 
into influenza-infected mice lacking their own 
stem cells. This restored the repair process, 
opening up the possibility that stem-cell therapy 
will eventually be used to treat lung conditions. 
These findings require much more fundamental 
research. But there are many lung conditions for 
which only palliative therapies are available — a 
powerful incentive to explore the possibilities. 

A key question is how these lung stem cells 
interact with their associated mesenchymal 
(non-epithelial) cells to mediate a functional 
repair process. In the adult skin, mesenchymal 
cell populations from differing embryonic ori- 
gins have distinct roles in wound repair’. Lung 
mesenchymal cell populations are just begin- 
ning to be characterized”. If any medical appli- 
cations are to arise from studies of lung repair, 
the full cellular picture will be required. = 


Emma L. Rawlins is at the Wellcome Trust/ 
Cancer Research UK Gurdon Institute, the 
Wellcome Trust/MRC Stem Cell Institute and 
in the Department of Pathology, University of 
Cambridge, Cambridge CB2 1QN, UK. 
e-mail:e.rawlins@gurdon.cam.ac.uk 
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Stellar clocks 


A link between rotation and age for Sun-like stars has long been known, but a 
stringent test of it for older stars has been lacking. The Kepler mission helps to fill 
this gap with observations of an old star cluster. SEE LETTER P.589 


DAVID SODERBLOM 


he clocks of the cosmos tick constantly, 
but too softly for us to hear, and so we 
cannot directly measure the ages of 
stars. Yet we need good ages, because nearly 
every aspect of astrophysics deals with how 
things evolve with time. Even the Sun is silent 
on this score, and it is only from being able to 
study Solar System material — meteorites — in 
the laboratory that we know the Sun’s age so 
exquisitely: 4,567+1+5 million years’. In this 
issue, Meibom et al.” (page 589) describe a key 
step needed to obtain better estimates for the 
ages of cool, low-mass stars like the Sun. 
Many of the stars for which we would like 
to know the ages are like the Sun. They live 
for 10 billion years or more, and so represent 
the entire age of our Galaxy’s disk, which is 
where most of the Galactic stars reside. Sun- 
like stars are the ones that naturally excite the 
most interest in our quest for other Earths, 
and, indeed, the first question that will be 
asked when someone reports signs of life on 
an exoplanet will be, ‘how old is the host star?; 
because we will want to place the discovery in 
an evolutionary context. 
What makes the Sun and stars like it 


favourable for life-bearing planets is that they 
change very slowly over time. But that also 
makes them inaccessible to age estimation 
by conventional means, which is by deter- 
mining a star’s temperature, luminosity and 
composition and then comparing them with 
stellar models. Those models themselves are 
calibrated against the Sun, the only star with 
well-established fundamental properties. 

Given this problem of estimating ages, we 
settle for what we can get’, and it has been sus- 
pected for some time that, for Sun-like stars, 
rotation declines with age on something close 
to a power-law relation’ — a nice straight line 
in a log-log plot. This is convenient, in that 
our concern with accuracy roughly scales as 
the age itself, and if we could reliably derive 
ages that are good to 5% or better, that would 
improve our knowledge of stellar and Galactic 
processes substantially. 

But why does the Sun spin so slowly, as much 
as 100 times more slowly than some young 
Sun-like stars, and is the spin rate a reliable 
clock? As was shown’ in 1967, slow rotation is 
a property shared by all Sun-like stars, and the 
mechanism underlying the phenomenon starts 
right where the outer layers of the stars begin to 
become convective. Thus, it is convection that 


Figure 1 | Star cluster NGC 6819, the concentration of stars visible in the centre of the image. 
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makes the Sun and stars like it Sun-like. In the 
Sun, convection and rotation lead to complex 
motions of the conducting plasma of the con- 
vective outer layer. Those motions generate a 
magnetic dynamo that is based in or just below 
the outer layer. The magnetic field produced 
by the dynamo can then grip the solar wind 
of high-speed charged particles beyond the 
Sun’s surface, and so angular momentum is 
steadily lost. 

The loss of angular momentum is a slow 
process for the present-day Sun, but we also 
know that, because of their faster spin rate, 
young Suns generate much stronger magnetic 
fields. The magnetic dynamo provides a feed- 
back mechanism that causes convergence in 
the spins of stars that have the same age but 
different initial rates. Observationally, that 
convergence seems to occur by about the age 
of the nearest loosely bound star cluster, the 
Hyades — that is, 600 million years or so. 

Star clusters are fundamental to studies of 
angular-momentum loss because they provide 
good-sized samples of stars sharing the same 
composition and age. But the problem is that 
there are few clusters older than the Hyades. 
Clusters get ripped apart as they orbit in our 
Galaxy owing to tidal forces from objects 
such as giant molecular clouds or black holes 
(the source of the effect is still poorly under- 
stood). Such break-up accounts for why stars 
are spread all over the sky, as opposed to being 
concentrated in clusters, but it also means that 
clusters older than 0.5 billion years or so are 
inherently rare, and thus few are near the Sun. 
Compounding the difficulty, the faint stars 
in these more distant clusters have fewer and 
smaller star spots. These are regions of lower 
temperature than the surrounding surface 
that cause the star to dim and brighten when 
they rotate in and out of view, and so can be 
used to measure the star’s spin. The few small 
spots on older stars yield variations in starlight 
that are hard to detect using ground-based 
telescopes. 

That is why the capabilities of a mission 
such as NASAs Kepler satellite, with its exqui- 
site measurements of stellar brightnesses, 
is essential, and so why Meibom and col- 
leagues’ work, which is based on Kepler data, 
matters. The authors used the satellite to 
measure the rotation period of 30 cool stars 
in the star cluster NGC 6819 (Fig. 1), which is 
about 2.5 billion years old. NGC 6819 there- 
fore fills the large gap in age between the Sun 
and existing cluster observations. By using 
methods such as the study of rotation, Kepler 
has revolutionized stellar physics as much as 
it has the study of exoplanets. It means that 
we can consider using the slow, steady spin- 
down of Sun-like stars as a way of determining 
their ages. 

But this method of estimating a star’s age 
has limitations. The main one is that we do not 
understand the physics of rotation and angu- 
lar-momentum loss in Sun-like stars, and the 


rotation—age relation remains purely empirical. 
This is fine if we can calibrate that relation well 
and if there is a tight correspondence between 
rotation and age. But stars can acquire extra 
angular momentum late in their lives by swal- 
lowing a companion, such as another star or 
a planet. Orbiting objects have much more 
angular momentum than has a star, and so 
even small bodies can be significant. We know 
of no means by which a star can have its angu- 
lar momentum stolen, and so these accumula- 
tions add a systematic uncertainty. 

Also, as noted, older stars have at best only 
weak variations in their light. With even the 
highest-precision measurement, we cannot 
always see the signal of the Sun’s rotation. 
Likewise, not all stars reveal their rotation to 
us even when we badly want them to; nature is 
indifferent to our curiosity. But persistence can 
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pry out those secrets. Meibom and colleagues’ 
study shows exactly why we develop new capa- 
bilities: there is the good reason (in the case of 
Kepler, it was to find Earth-like planets, a very 
good reason), and then there is the real reason, 
which is to enable clever people to do what was 
not foreseen at the start. m 
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Seeing the wood 
and the trees 


The identification of the gene regulatory network that controls the formation 
of xylem — the major component of wood — opens up new avenues for 
manipulating plant biomass. SEE ARTICLE P.571 


ANTHONY BISHOPP & MALCOLM J. BENNETT 


ellulose, hemicelluloses and lignin are 

key natural polymers that make up the 

bulk of plant biomass’. These biopoly- 
mers are also renewable resources for the 
production of dietary fibre, paper and bio- 
fuels’. On page 571 of this issue, Taylor-Teeples 
et al.’ report the identification of the gene regu- 
latory network that controls the synthesis of 
these biopolymers in root xylem cells of the 
model plant Arabidopsis thaliana. 

Xylem is a plant tissue that provides 
mechanical support and the main mechanism 
for transporting water and nutrients from 
root to shoot tissues. To perform these impor- 
tant functions, xylem cells deposit a specially 
reinforced structure termed the secondary cell 
wall' (Fig. 1). Xylem secondary cell walls are 
composed mainly of cellulose, hemicelluloses 
and lignin. The cellulose forms a network of 
load-bearing fibres coated in hemicelluloses 
and embedded in lignin, providing mechanical 
strength and rigidity (akin to steel rods set in 
reinforced concrete). However, the presence 
of lignin is a major impediment to efficient 
extraction of the sugars in cellulose and hemi- 
celluloses for their conversion to biofuels’. 
Hence, understanding how the relative pro- 
portions of these biopolymers are controlled 
in plant tissue would open up opportunities 
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to redesign plants for biofuel use. 

Xylem cells control the relative abundance 
of biopolymers in part by regulating expres- 
sion of the genes that encode the enzymes for 
polymer synthesis*. Expression is controlled 
by transcription-factor proteins that bind 
DNA sequences, termed promoters, close to 
the genes. A handful of transcription factors 
have been identified that control the expres- 
sion of individual genes regulating the produc- 
tion of cellulose, hemicelluloses and lignin. 
But this small-scale, gene-by-gene approach 
has provided a highly fragmented picture of 
the potential regulatory interactions between 
xylem-associated transcription factors and 
their gene targets. 

Taylor-Teeples et al. instead adopted a 
network approach — screening more than 
460 transcription factors expressed in the root 
xylem of A. thaliana for their ability to bind the 
promoters of around 50 previously character- 
ized genes that encode cell-wall components or 
other transcription factors involved in xylem 
formation. This large-scale analysis provided a 
remarkable overview of the regulatory process, 
revealing a highly interconnected network 
composed of some 240 genes and more than 
600 new protein-DNA interactions. 

The xylem regulatory network shows 
that each cell-wall gene is bound, on aver- 
age, by 5 different transcription factors, each 


belonging to one of 35 distinct families of 
regulatory proteins. This regulatory arrange- 
ment provides a huge number of combina- 
torial possibilities, which Taylor-Teeples 
and colleagues show is crucial for integra- 
ting environmental signals such as salt or 
iron stress — alterations in the expression of 
certain transcription factors allowed different 
sub-networks to be used to adapt the cellular 
response to these conditions. 

The network also reveals that many of the 
transcription factors are not part of simple 
linear pathways, but form a series of feed- 
forward loops (FFLs). Such regulatory sys- 
tems are well recognized in systems biology, 
and typically involve a transcription factor that 
controls the expression of other transcription 
factors, which then collectively co-regulate 
their target genes. For example, the authors 
find that the transcription factor E2Fc binds 
to more than 20 promoters, including those for 
the genes encoding the transcription factors 
VND6, VND7 and MYB46, as well as genes 
associated with cellulose, hemicellulose and 
lignin production. Although FFLs are com- 
mon in biological systems’, they are remarka- 
bly numerous in the xylem network, occurring 
close to 100 times. They are also frequently 
embedded within one another, creating FFL 
cascades. For example, the network shows that 
VND7 and MYB46 also bind to the promoters 
of many E2Fc target genes. 

So why are there so many FFLs? Not only 
are there many possible components, but 
even for simple systems with only three 
components, there are many possible ways 
to wire them®. Common to all FFLs is a 
direct path (in which a source transcrip- 
tion factor regulates a target gene) and an 
indirect path (in which the same source fac- 
tor regulates an intermediate transcription 
factor that regulates the same target). 
For ‘coherent’ FFLs, the direct and indirect 
paths have the same overall effect on the tar- 
get gene (both activate or repress its expres- 
sion), whereas for incoherent FFLs, one path 
activates and the other represses. Mathemati- 
cal modelling of these loops has revealed that 
different arrangements can produce a range 
of responses from target genes. For example, 
coherent FFLs can protect against unwanted 
responses to fluctuations in inputs, whereas 
incoherent FFLs can speed up transcrip- 
tional responses’. In the case of the xylem 
network, a coherent FFL could result in 
tight regulation of cell-wall gene expression, 
thereby promoting secondary cell-wall syn- 
thesis in a switch-like manner to prevent the 
deposition of secondary cell-wall material in 
non-xylem cells. 

It is not yet possible to determine exactly 
what types of FFL are present in the xylem 
regulatory network described by Taylor- 
Teeples et al., because although the technol- 
ogy used by the authors identifies interactive 
nodes, it cannot predict whether they relate 
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Figure 1 | Building a secondary cell wall. The cell 
wall of plant xylem cells (a tracheary-element cell 
grown in vitro is shown) contains several layers, 
including an outer granular matrix, a primary cell 
wall composed mainly of the biopolymer cellulose 
and an inner secondary cell wall composed of 
microfibrils of cellulose and lignin. Taylor-Teeples 
et al.” have characterized the gene regulatory 
network that determines the production of xylem 
biopolymers. (Figure adapted from ref. 7.) 


to transcriptional activation or repression. 
However, these nodes provide a framework 
for future research to characterize key inter- 
actions in a targeted, gene-by-gene manner, 
and to determine the precise regulatory 
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structure. This will allow the identification of 
ways to manipulate this network to engineer 
different cellular properties and develop new 
plant varieties for biofuel use. The descrip- 
tion of the network also helps to explain why 
plant transcription factors have so far largely 
eluded identification by genetic screens, owing 
to functional redundancy among regulators 
of secondary-cell-wall biosynthesis. This 
knowledge can now be used to perform more- 
precisely targeted screens of gene function, by 
creating combinations of mutations that over- 
come this genetic redundancy. = 
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Relativity tested 
with a split electron 


Splitting and recombining an electron wave packet has been used to test relativity 
at a record sensitivity. The result heralds an era of precision measurements of 
relativity using quantum-information methods. SEE LETTER P.592 


V. ALAN KOSTELECKY 


orentz invariance is a fundamental 
[omen of space-time that lies at the 
heart of Albert Einstein's special theory 
of relativity. Writing in this issue (page 592), 
Pruttivarasin et al.' show how they have tested 
Lorentz invariance for electrons at an unprec- 
edented sensitivity by splitting and recombin- 
ing a superposition of electron wave functions 
— an electron wave packet — that is bound 
to calcium ions. This ingenious experiment 
opens the door to a new generation of preci- 
sion tests of relativity using quantum-informa- 
tion techniques. 
Lorentz invariance states that the laws of 
physics that govern a physical system are 
unchanged for different system orientations 


or velocities. Equivalently, the laws of physics 
exhibit rotation symmetry (spatial isotropy) 
and boost symmetry. In practice, it is easier 
to rotate an apparatus than to boost it, and 
therefore many tests of relativity are designed 
to explore the spatial isotropy of the behav- 
iour of a physical system. Pruttivarasin and 
colleagues’ experiment can be viewed as a 
quantum analogue of two famous tests of 
spatial isotropy: the Michelson-Morley 
experiment for electrodynamics’ and the 
Hughes-Drever experiment for matter**, 

The Michelson—Morley experiment uses a 
device called an interferometer that first splits 
a light ray into two beams travelling along 
orthogonal paths (arms), and then reflects and 
recombines the beams to yield an interference 
pattern. Monitoring the interference pattern as 
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the orientation of the apparatus changes can be 
understood as testing the constancy of the speed 
of light and hence the rotation symmetry of the 
laws of electrodynamics. If the speed of light is 
different in different directions, then the travel 
times of the two beams vary as the apparatus is 
rotated, changing the interference pattern. 

By contrast, the Hughes—Drever experiment 
studies the spatial isotropy of the propaga- 
tion of matter by examining the connection 
between its energy and its momentum. In 
essence, the experiment involves placing one 
or more atoms in a magnetic field, thereby 
splitting some atomic energy levels and 
imbuing the system with a definite orienta- 
tion. Monitoring the frequency of transitions 
between certain energy levels for a changing 
orientation of the whole apparatus provides a 
check of isotropy. If the energy levels depend 
on the momentum directions of the atomic 
constituents, the transition frequency will 
change as the apparatus is rotated. 

In their study, Pruttivarasin et al. take 
advantage of the quantum nature of the 
electron, realizing an electron analogue of 
the Michelson—Morley and Hughes—Drever 
experiments through quantum-information 
techniques that allow a suitable electron wave 
packet to be created and monitored. The 
authors confined a pair of calcium ions (““Ca‘), 
about 16 micrometres apart, inan electromag- 
netic trap and applied a vertical magnetic field 
of 3.93 gauss to introduce a definite orientation 
of the system, which changes as Earth rotates. 
They then applied laser pulses to electrons 
bound to the calcium ions, creating an electron 
wave packet that combines two quantum states 
of different electron orientations relative to the 
magnetic field and that oscillates between two 
configurations. The researchers used further 
laser pulses to monitor the energy difference 
between the two states for 23 hours. The experi- 
mental procedure can be viewed as repeatedly 
splitting and then recombining the two quan- 
tum states 95 milliseconds later. Violations of 
spatial isotropy would be seen as variations in 
this energy difference as Earth rotates. 

The experiment confirms spatial isotropy 
for electrons at the impressive level of one part 
in 10'*. This represents a milestone sensiti- 
vity, because it is smaller than the dimension- 
less ratio of about 10°'’ between the strengths 
of the electroweak and gravitational forces 
that could naturally be expected to govern 
violations of Lorentz invariance arising in 
unified theories of quantum physics and 
gravity’. The authors’ experiment is thus the 
first to delve into this realm of sensitivity 
for electrons. 

A crucial subtlety in interpreting tests of 
Lorentz invariance and hence of special rela- 
tivity is that a physical reference system is 
required to define the lengths of rods, the tick- 
ing rates of clocks and the idea of orthogonal- 
ity in space and time. For example, in a classic 
Michelson—Morley experiment, the lengths 


and orthogonality of the interferometer arms 
are established in terms of properties of matter. 
The interpretation of the experiment as a test 
of the isotropy of the speed of light therefore 
relies on the assumption that these properties 
are independent of the system's orientation. 
Analogously, the interpretation of a Hughes- 
Drever experiment as a test of the rotation 
symmetry with matter assumes isotropic 
transition frequencies of light. Indeed, any 
measurement in physics is really a comparison 
between two systems, with only the difference 
between them being physically meaningful. 

It follows that Pruttivarasin and co-workers’ 
experiment can be viewed equally as an isotropy 
test for light assuming conventional electrons 
or as an isotropy test for electrons assuming 
conventional electrodynamics. In the first 
scenario, the experiment is interpreted as 
searching for possible spatial anisotropies in 
the Coulomb force that binds the electron 
wave packet to the calcium ions, and hence in 
the laws of electrodynamics, and the results 
represent a fivefold improvement in sensitiv- 
ity over current limits®’. In the second picture, 
the energy of the electron wave packet depends 
on the direction of its momentum, and the new 
constraints sharpen existing bounds® 100-fold. 

Possible experimental improvements include 
choosing different ions to yield a longer-lived 
electron wave packet, binding the wave packet 
to ions of greater charge, preparing the wave 
packet more directly, and taking data over a 
longer period. Another 100-fold improvement 
in sensitivity may lie within reach. 

Other tests of Lorentz inariance are also 
feasible using these methods. Lorentz 


violations accessible in principle include those 
characterized by nine coefficients’, each cor- 
responding to a different physical effect. Six 
coefficients govern violations of rotation sym- 
metry, whereas three control boost violations. 
Pruttivarasin et al. obtained constraints involv- 
ing four of the six types of rotation-symmetry 
violation (see Table 1 of the paper'). The 
remaining two could be studied in a similar 
experiment mounted on a turntable with its 
axis of rotation differing from that of Earth. 
The three coefficients controlling boost vio- 
lations could also be measured with tenfold 
improved sensitivity using data acquired 
over many months, by taking advantage of 
the changing direction of Earth’s velocity as it 
revolves around the Sun. Stay tuned for future 
cutting-edge tests of relativity using these 
quantum-information techniques. = 
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CRISPR engineering 
turns on genes 


The repurposing of a bacterial defence system known as CRISPR into a potent 
activator of gene expression in human cells enables powerful studies of gene 
function, as exemplified in cancer cells. SEE ARTICLE P. 583 


SEUNG WOO CHO & HOWARD Y. CHANG 


he ability to turn on any gene at will has 

been a long-held dream of molecular 

biologists. Most genes are dynamically 
turned on and off by specific biological pro- 
cesses, and manipulation of the level of gene 
expression is a key method for studying 
the functions of each gene, regulatory ele- 
ment and pathway. On page 583 of this issue, 
Konermann et al.’ describe an elegant strategy 
for converting the CRISPR/Cas9 system — a 
bacterial defence system against foreign DNA — 
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into a potent and selective gene activator. The 
authors demonstrate how this approach can be 
used to test the effects of turning on tens of 
thousands of individual genes in parallel. 
CRISPR, which stands for clustered regu- 
larly interspaced short palindromic repeats, is 
the name given to regions of bacterial DNA 
encoding RNA sequences that recognize 
foreign DNA sequences, such as those in 
viruses, through direct sequence complemen- 
tarity. In bacteria, these guide RNAs assemble 
into complexes with the enzyme Cas9 or other 
proteins that specifically cut the recognized 
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Figure 1 | RNAs as modular scaffolds for gene regulation. a, Konermann et al.' have engineered 
natural CRISPR short guide RNA molecules (sgRNAs) to become modular scaffolds for protein assembly, 
thus forming complexes that can specifically regulate gene expression through the action of multiple 
transcription-regulatory domains. Aptamer structures in the sgRNAs recruit specific proteins: in this 
case, RNA-binding proteins fused to the transcription-activating domains of different transcription 
factors. The complex’s DNA-targeting specificity is provided by base pairing between a different part of 
the CRISPR RNA and the target DNA sequence. The enzyme dCas9, which associates with CRISPR RNA 
and helps to unwind the DNA, can also be fused to another regulatory domain, thereby increasing the 
diversity of effects. b, This engineering of the CRISPR system incorporates design principles from natural 
long non-coding RNAs (IncRNAs), which can recruit multiple cellular machines that regulate gene 
expression by modifying chromatin (the complex of histone proteins and DNA in the cell nucleus). 


DNA sequence, thereby destroying it and 
protecting the bacterium from invasion’. 
CRISPR/Cas9 thus represents a programmable 
DNA-targeting system, with its specificity 
determined by the RNA sequence. 
Molecular biologists have adopted this 
system to allow the rapid mutation or replace- 
ment of genomic sequences — a strategy called 
genome editing. A breakthrough came in 2012, 
when the system was simplified to use single 
short guide RNA (sgRNA) molecules to pro- 
gramme CRISPR specificity’. Soon after, it was 
found that a mutant of Cas9 that no longer cuts 
DNA, termed dCas9, can be used as a DNA- 
binding platform**. When dCas9 is linked to 
portions (domains) of proteins involved in 
transcriptional activation or repression and 
then targeted, using CRISPR, to promoter 
sequences that regulate transcription of 
particular genes, these fusion proteins can mod- 
ulate natural gene-expression levels*®. However, 
the change in gene expression achieved by this 
approach is too low — less than or around 
fivefold activation — for many applications. 
Konermann and colleagues overcame this 
low efficiency of gene activation by turning 
the CRISPR sgRNA into a modular platform 
for assembling multiple different transcrip- 
tional activators (Fig. 1a). They identified two 
regions of the sgRNA that can be appended 
with short sequences that attract an RNA- 
binding protein, which is in turn fused to the 
transcription-activation domains of differ- 
ent mammalian transcription factors. The 


authors termed this system the synergistic 
activation mediator (SAM), and demonstrate 
that it induced more than 100-fold activation 
of 12 genes that were not efficiently activated 
by the dCas9-activator fusion protein. 

To illustrate the potential applications of this 
approach, Konermann et al. created a library 
of engineered sgRNAs that allowed more than 
23,000 human genes to be individually turned 
on. They then asked which genes, on activa- 
tion, give melanoma cancer cells the ability to 
escape the killing effects of the drug PLX-4720, 
a mainstay of melanoma treatment. The degree 
of drug resistance conferred by turning on dif- 
ferent genes was determined by the relative 
frequency of sgRNAs in the melanoma cells 
after drug treatment. The highly enriched 
sgRNAs included those corresponding to 
genes involved in known drug-resistance 
pathways and to genes that are expressed at 
increased levels in patients with drug-resistant 
melanoma — verifying that the SAM method 
can identify biologically relevant outcomes of 
altered gene expression. 

The success of this CRISPR engineering 
effort has direct parallels with natural mecha- 
nisms of gene regulation. Enhancers are DNA 
sequences that turn on gene expression, and 
they typically contain recognition sequences 
for several different types of transcription 
factor. Moreover, enhancers and other regula- 
tory elements often generate long non-coding 
RNAs (IncRNAs), which act as modular scaf- 
folds to recruit diverse cellular machines that 
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modify chromatin (the complex of histone 
proteins and DNA in the cell nucleus) and 
thereby regulate gene expression (Fig. 1b). 
LncRNAs are molecular recipes writ large, 
containing both instructions for the set of 
biochemical activities to be assembled and 
the genomic address at which these activities 
should be carried out’. 

Two other recent studies have also dem- 
onstrated improved efficiency or flexibility 
of CRISPR-guided gene regulation, through 
repurposing dCas9 or CRISPR RNA as mod- 
ular scaffolds”*. One of those studies® further 
showed that multiple engineered CRISPR 
RNAs can simultaneously turn different genes 
in the same cell on and off to manipulate a 
metabolic pathway. Together with Koner- 
mann and colleagues’ findings, these studies 
show that mimicking natural IncRNAs is an 
efficient way to orchestrate multiple proteins 
to work together across the genome. It may be 
possible to deliver defined combinations of 
various effector proteins to the same genomic 
location using one sgRNA molecule. Future 
construction of novel multifunctional artifi- 
cial proteins or non-coding RNAs will also be 
worthwhile, given the broad usefulness of such 
tools in biotechnology. 

This next generation of CRISPR technol- 
ogy opens the door to studying the functions 
of many genes and DNA sequences. RNA- 
interference techniques have been widely used 
for studying the effects of loss of gene func- 
tion over the past decade, but this approach 
can yield a high rate of false-positive results 
due to nonspecific targeting. Meanwhile, 
gain-of-function studies using overexpression 
techniques may not recapitulate normal RNA 
regulatory processes, such as alternative splic- 
ing. Artificial transcription factors based on 
DNA-binding proteins called zinc fingers 
and TALENs are alternatives for altering gene 
expression, but these are difficult to construct 
on a genome-wide scale. Thus, the compre- 
hensive coverage of CRISPR libraries and the 
modular nature of this approach are strong 
advantages over other techniques. However, 
CRISPR targeting may also have off-target 
effects’, and additional validation experi- 
ments may be needed to confirm any effects 
of altered gene expression identified using 
this approach. 

In their melanoma-cell experiments, 
Konermann et al. identified 13 genes whose 
altered expression was individually sufficient 
to confer drug resistance. However, diseases or 
profound biological effects often result from 
complex regulation of multiple genes at the 
same time — a good example is the finding 
that four genes must be expressed together for 
the generation of induced pluripotent stem 
cells (a form of stem cell generated from adult 
cells)’. Thus, we will need a detailed under- 
standing of regulatory networks and will 
need to experiment with gene sub-libraries 
and dosages to identify the sets of genes that 
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together determine certain characteristics. 
The CRISPR/Cas system will be a versatile 
tool for this purpose, owing to its capacity for 
multiplexed targeting and, now, multiplexed 
deployment of diverse effector domains. m 
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Free and forced 
climate variations 


A combination of simulations and data shows that short-term climate trends 
are dominated by natural internal variations, providing a basis for climate 
forecasting, but not for assessing sensitivity to forced changes. SEE ARTICLE P.565 


JAMES RISBEY 


he global mean surface air temperature 

of the planet has increased over the past 

century, punctuated by periods of faster 
and slower warming lasting a decade or more 
at a time’ (Fig. 1). The apparent ‘slowdown in 
the rate of surface warming in the past 15 years 
is common in the record (as are accelerations), 
but has nonetheless drawn questions about the 
sources of fluctuations in the warming rate and 
about the implications of these fluctuations 
for climate-model projections. On page 565 
of this issue, Marotzke and Forster’ describe 
how they have used observations and climate- 
model runs to show that these fluctuations 
are dominated by natural (free) variations, 
and that decadal-scale trends do not provide 
grounds for revising the sensitiv- 
ity of climate models to forcings 
such as increased concentrations 


climate-model runs* to extract the contri- 
butions of processes internal to the climate 
system (free variation) that are due mainly 
to ocean circulation and its coupling with the 
atmosphere, and the contributions of external 
processes (forced variation) such as volcanoes, 
variations in greenhouse-gas concentrations 
and solar output. 

Marotzke and Forster provide the first 
quantification of these contributions for 
trends of length 15 years and 62 years in the 
century-long instrumental climate record. 
Their method uses the statistical tool of 
multiple regression on components of the 
surface energy balance (relating changes in 
forcing and temperature) to fit global surface- 
temperature trends in models, on the basis of 
the model forcing and response. The method 


0.8 

of greenhouse gases. 

It has long been known that S 06 
the global surface temperature = o4 
fluctuates on multidecadal time- S 
scales in response to natural vari- s ve 
ations in ocean circulation and BE oO 
other processes. Since at least the 5 02 
1930s, climatologists have used E ; 
30-year averages as a standard a 
climate ‘normal’ to smooth out BOG 
decadal variations’. The 1995 1840 1860 1880 1900 1920 1940 1960 


Intergovernmental Panel on 
Climate Change report’ identi- 
fied volcanic activity, variations 
in incoming solar radiation, and 
ocean circulation as drivers of 
decadal and longer variability 
in global surface temperature. 
An innovation of Marotzke 
and Forster’s work is their use 
of coupled ocean-atmosphere 


Figure 1 | Fifteen-year temperature trends. The solid black line is the time 
series of the global mean surface temperature’, plotted as a departure (anomaly) 
from a baseline period 1961-90. The dashed black line is a smooth fit to this 
series, representing the long-term warming rate. The blue and red lines are linear 
trends for each 15-year segment running over 1850-64, 1851-65, ..., 1999-2013. 
Each 15-year segment is shown in red if the trend rises faster than the long-term 
warming rate in the same 15-year period and blue if it rises more slowly. Marotzke 
and Forster” show that these 15-year trends are dominated by natural (free) 
variations. The free variations drive the 15-year trends above and below the 
long-term warming rate as they ride along with it. 
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does not account for any systematic errors 
in climate-model representations of forcing, 
and may under-represent some slow climate 
responses that alter temperatures on time- 
scales longer than the trend periods exam- 
ined. Furthermore, the method attributes all 
residual between the model-fitted trends and 
the actual model trends to free variation. These 
assumptions mean that the method may over- 
estimate the free contribution to the model 
trends, but trends predicted in this way are 
consistent with those observed. The results 
show that the free variations are the dominant 
contributor to model 15-year trends, and are 
also important, but less so than forced varia- 
tions, for the model 62-year trends. The results 
provide model-based confirmation that free 
variations dominate 15-year trends across the 
instrumental period. 

The other big issue addressed by Marotzke 
and Forster is whether the coupled climate 
models have overestimated the sensitivity of 
the climate system to forced changes. Some 
have claimed that the recent slowdown in 
surface warming is not well captured in 
climate-model projections®, and further, that 
it implies that climate models may have over- 
estimated the response of the climate system 
to increases in greenhouse gases”. 

A range of studies have now addressed 
the first claim using a variety of approaches. 
Some of these address the issue that a single 
15-year period in the real world 
is not by design synchronized 
with the same 15 years in model 
projections. The free variations 
in the latter are not, and cannot 
be, synchronized with the real 
world’. This means that nei- 
ther a single model run, nor the 
average of many model runs, 
is expected to match a given 
observed 15-year trend. How- 
ever, by selecting model runs 
according to criteria that effec- 
tively phase-lock the model 
free variation to the observed free 
variation, the models provide 
good representations of observed 
15-year trends*”. 

The observed temperature 
trends in the most recent 15-year 
period in which the slowdown 
occurs can also be reproduced 
in the models by specifying 
the observed surface winds to 


synchronize the model free variation'®. Other 
work shows that, by examining all 15-year 
trends in observations in the instrumental 
record anda large ensemble of 15-year trends 
in climate projections, there is no systematic 
bias in the model trends”. 

The response of a climate model and the 
climate system over a set period depends on 
the length of the period examined. For longer 
periods, the change in forcing produces a 
change in temperature, which is in turn ampli- 
fied through other responses of the climate sys- 
tem, primarily related to the phases of water 
(vapour, cloud, ice). The amplification pro- 
cess is termed the climate feedback. If climate 
feedbacks are too strong in a model, then the 
model may overestimate the forced response 
over a period. 

Using measured feedbacks and forcings in 
climate models"’, Marotzke and Forster show 
that the contribution of spread in climate feed- 
back to spread in model temperature trends 
over 15-year periods is small compared with 
those from free variations and direct forcing. 
This means that comparisons of observed and 
model 15-year trends can yield information 
about free variations in the model, but not 
about climate feedback or sensitivity, because 
climate feedback plays little part on this time- 
scale. Even allowing for some uncertainty in 
Marotzke and Forster’s method, the results are 
emphatic enough for there to be little room 
for concluding that climate models are over- 
estimating the response of the climate system 


on the basis of short-trend comparisons. Such 
ad hoc revisions of climate sensitivity are 
also problematic because changes in forcing 
associated with aerosol emissions are hard to 
quantify (confounding the forced change), and 
accurate estimates of ocean heat content are 
not available before about 2006. 

With the work of Marotzke and Forster, 
we now have a clearer view of the contribu- 
tion of free and forced variations to 15-year 
trends. The climate system and climate models 
generate a long-term warming in response to 
steadily growing forcing by greenhouse-gas 
increases. On shorter timescales, the warm- 
ing is not steady, but speeds up and slows 
down (Fig. 1) in response to free variations in 
the climate system, and much less so due to 
forced variations from solar cycles, volcanic 
eruptions and other aerosol sources. The (non- 
greenhouse-gas) forced variations can have a 
larger role in any particular 15-year period, but 
because they are irregular (volcanoes) and/or 
weak (solar variations), their role is gener- 
ally much smaller when viewed across many 
15-year periods. 

The source of the free variations is in the 
intrinsic circulation modes in the ocean and 
their coupling to intrinsic modes in the atmos- 
phere’*. Some of this variation is predictable 
and is stimulating the development of initial- 
ized climate forecasts’’. These hold out the 
promise of better adaptation to climate vari- 
ability on short timescales. The free variations 
and occasional enhanced forcing by aerosols 


Risk factors and 
random chances 


The discovery that the estimated number of stem-cell divisions in a tissue 
correlates with cancer incidence suggests that the varying probability of 
developing cancer in different tissues is mostly down to random mutations. 


DOMINIK WODARZ & ANN G. ZAUBER 


ancer arises through the accumulation 

of molecular changes that together 

allow cells to grow in an uncontrolled 
manner. Many factors contribute to this pro- 
cess, including hereditary genetic mutations 
and environmental hazards, such as exposure 
to smoking or radiation. But cancer can still 
emerge in the absence of these factors, as a 
result of random mutations that arise during 
cell division’. Furthermore, little is known 
about how each risk factor affects differ- 
ent tissues, in which cancers arise at varying 
frequencies. Writing in Science, Tomasetti 
and Vogelstein’ show that about 65% of the 


differences in cancer incidence between 
tissues can be simply explained by the esti- 
mated total number of stem-cell divisions in 
those tissues. This result suggests that, rather 
than environmental and heredity influences, 
stochastic accumulation of mutations during 
DNA replication is the major cause of varia- 
tions in cancer incidence between tissues. 
Estimating the number of stem-cell 
divisions that occur in a tissue is a complex 
task. Tomasetti and Vogelstein performed 
an extensive literature search to obtain infor- 
mation about stem-cell numbers in various 
tissues. They then used mathematical and 
statistical models to estimate the number of 
stem-cell divisions in each tissue during the 
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and solar cycles drive fluctuations that ride on 
top of the longer-term warming response but 
do not subdue it. m 
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average human lifespan, and correlated this with 
publicly available cancer-incidence data. 
Although some cancers are certainly driven 
by genetic predisposition or environmental 
risk factors, the authors’ analysis indicated 
that these factors explain only one-third of the 
variation in cancer incidence between tissues 
(Fig. 1). The rest of the variability was explained 
by the different number of stem-cell divisions 
estimated to occur in different tissues. How- 
ever, some commonly occurring cancers, such 
as breast and prostate, were not included in the 
analysis, and these must be examined when 
data become available. Importantly, although 
this paper attempts to explain the variation in 
cancer incidence among tissues, it does not try 
to quantify the percentage of cancers that arise 
owing to random accumulation of mutations 
alone, which is a different measure. 

This study suggests that processes governing 
the evolution of cancer cells need to be better 
understood, so that they can be manipulated to 
delay the onset of cancer. One drug that might 
affect evolutionary dynamics is aspirin, which 
protects against a variety of cancers’. Aspirin 
modulates evolutionary parameters that deter- 
mine the rate of cancer development, including 
cell-division and cell-death rates*, and the per- 
sistence of cells with elevated mutation rates”. 

In addition to experimental studies, math- 
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ematical models are crucial for investigating 
the evolutionary dynamics of cancer®. Such 
models have led to the development of several 
principles that define how random mutations 
and subsequent selection influence the emer- 
gence and growth of tumour cells in certain 
settings. For example, stochastic evolutionary 
models have been used to study the emergence 
and spread of tumour cells in healthy tissue”*. 
Furthermore, models that take into account 
stem-cell dynamics have been used to study 
the evolutionary pathways by which cancerous 
cells escape the constraints of tissue regulation’. 

Nonetheless, much remains to be done, and 
Tomasetti and Vogelstein’s paper highlights 
some intriguing directions for future research. 
For instance, although the authors assumed 
that random cancer-causing mutations 
occur during stem-cell division, it is actually 
unknown in which tissues stem cells (rather 
than more-differentiated cells) become cancer- 
ous'’. The correlation they report could arise 
either way, because the number of stem-cell 
divisions probably correlates with the number 
of divisions undertaken by the differentiating 
daughters of stem cells. 

Another gap in our knowledge concerns 
the relative importance for cancer evolution of 
embryonic cell divisions compared with divi- 
sions after birth”. Although many cell divisions 
occur after birth in most tissues included in this 
study, almost no cell division occurs after birth 
in the cells that give rise to glioblastoma”, an 
aggressive brain tumour. But Tomasetti and 
Vogelstein found that the incidence of glio- 
blastoma fits the same trend as the other 
cancers they studied. 

Last but not least, we need to understand 
how various environmental selection pres- 
sures influence the fate of the mutant cells that 
are generated by chance. Microenvironmental 
conditions, such as immune responses, inflam- 
mation or the presence of cancer-causing mol- 
ecules, can affect the fitness of specific mutants 
and thus their ability to grow and give rise to 
disease. These conditions can change dur- 
ing ageing, resulting in environments that 
are conducive to the growth of cancerous 
cells’. Although the correlation observed by 
Tomasetti and Vogelstein is certainly intrigu- 
ing, and although accumulation of mutations 
clearly has a central role in causing variability 
between tissues, much research is needed to 
disentangle the complex, multifactorial inter- 
actions that result in disease. 

A landmark paper" published in 1981 
suggested that most cancers could be averted 
by removing various lifestyle, behavioural 
and environmental risk factors prevalent in 
the population, and thus that much of the 
risk of cancer could be controlled. Risk fac- 
tors are certainly involved in promoting 
the occurrence of many cancers, especially 
the more common ones. It therefore makes 
sense to use the presence or absence of risk 
factors to identify individuals who should 
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Figure 1 | Mechanisms of cancer development. a, Exposure to environmental carcinogens can cause 
genetic mutations in a healthy cell population, leading to the generation of cancerous cells (two mutations 
are shown as sufficient for tumour growth here, although more are often required). b, If a mutation 
associated with cancer has been passed down from parents to offspring, the offspring has a hereditary 
predisposition to that cancer. Tumour growth is initiated if any of these cells randomly acquires a second 
mutation during cell division. c, In the absence of environmental risk factors and genetic predisposition, 
acquisition of random mutations alone can be sufficient to cause cancer. Tomasetti and Vogelstein’ report 
that this is the major cause of the variable rates at which cancer arises in different tissues. 


be screened for cancer. If, however, random 
genetic changes are more-relevant drivers of 
carcinogenesis, then biomarkers and early- 
detection methods will have to be developed 
to prevent cancer mortality in the general 
population. 

This might be problematic. Screening for 
many of the cancers included in Tomasetti 
and Vogelstein’s study is currently difficult, 
particularly for rare cancers. Screening is suc- 
cessful only when several criteria are fulfilled: 
that tests are available that detect early disease; 
that the test’s sensitivity and specificity for the 
given cancer is high; that people are willing 
to be screened; and that effective treatments 
exist that can be applied to early-stage cancers 
to prevent death. Furthermore, the benefits 
of screening must outweigh the risks of offer- 
ing screening to the general population. The 
‘number needed to screen’ to prevent one can- 
cer death must be feasible, given the resources 
required to perform screens on the population 
as a whole (see go.nature.com/6lefy9). 

In summary, Tomasetti and Vogelstein’s 
findings emphasize the role of basic evolu- 
tionary mechanisms in cancer development, 
and might lead to new chemoprevention 
therapies that slow the evolutionary processes 
at work. Although screening according to 
risk factors remains a crucial intervention 
strategy, an improvement in our cancer- 
prevention efforts in the general population 
will require the generation of early-detection 
techniques. = 
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Forcing, feedback and internal variability 


in global temperature trends 


Jochem Marotzke! & Piers M. Forster? 


Most present-generation climate models simulate an increase in global-mean surface temperature (GMST) since 1998, 
whereas observations suggest a warming hiatus. It is unclear to what extent this mismatch is caused by incorrect model 
forcing, by incorrect model response to forcing or by random factors. Here we analyse simulations and observations of 
GMST from 1900 to 2012, and show that the distribution of simulated 15-year trends shows no systematic bias against the 
observations. Using a multiple regression approach that is physically motivated by surface energy balance, we isolate the 
impact of radiative forcing, climate feedback and ocean heat uptake on GMST—with the regression residual interpreted 
as internal variability—and assess all possible 15- and 62-year trends. The differences between simulated and observed 
trends are dominated by random internal variability over the shorter timescale and by variations in the radiative forcings 
used to drive models over the longer timescale. For either trend length, spread in simulated climate feedback leaves no 
traceable imprint on GMST trends or, consequently, on the difference between simulations and observations. The claim 
that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas 


concentrations therefore seems to be unfounded. 


The GMST has risen in the past fifteen years at a rate that is only one- 
third to one-half of the average over the second half of the twentieth 
century (see, for example, refs 1-5). This hiatus is not reproduced in 
most simulations with present-generation climate models, which instead 
over the period 1998-2012 show a larger GMST trend than observed*"*. 
The difference between GMST observations and simulations is caused 
in part by quasi-random internal climate variability>°'*"*, which arises 
because of chaotic processes in the climate system. But part of the 
difference is probably caused by errors in the model radiative for- 
cing*’*"*""° or in the model response to radiative forcing’’*’”"*. The 
relative magnitudes of these three contributions are poorly known. Here 
we quantify how forcing, feedback and internal climate variability con- 
tribute to spread in simulated historical GMST trends and, hence, to 
the differences between models and observations. 

Weusea three-pronged approach. First, we note that, owing to quasi- 
random internal climate variability, the difference between observed 
and simulated trends likewise contains quasi-random contributions. To 
avoid focusing too strongly on the particular period 1998-2012, which 
contains some climate extremes relevant for GMST’’ and is hence 
unlikely to be reproduced in a simulation containing quasi-random 
contributions, we analyse GMST trends ofa certain length for the entire 
period 1900-2012". Second, we quantify the contributions of forcing, 
climate feedback, ocean heat uptake and internal variability to simu- 
lated GMST trends, through a multiple linear regression approach that 
is physically motivated by the global surface energy balance. And, third, 
we investigate trends over both 15 and 62 years, representing decadal 
and multidecadal timescales, respectively. We combine these three as- 
pects into a new unified conceptual framework, which allows us to put 
the GMST trends over the 15-year period 1998-2012 into the appro- 
priate context. 

We first create linear trends from an ordinary least-squares fit, and 
perform all statistical analyses on these trends. This procedure implies 
that the analysis must be repeated for each trend length, in contrast to 
previous work aiming at attributing elements in the observed GMST 
time series itself; such elements include effects of volcanic eruptions, 


solar variability, anthropogenic forcing, El Nino events and sources of 
atmospheric dynamic variability including land-sea contrasts'*!**-. 
Because the amplitude of internal variability decreases with increasing 
trend length*”®, we expect a clearer breakdown into the individual con- 
tributions from forcing, feedback and internal variability if we focus on 
one trend length ata time. We analyse trends over both 15 and 62 years, 
because these were the trend lengths primarily considered in the Inter- 
governmental Panel on Climate Change Assessment Report 5° (ARS). 


Observed and simulated 15-year trends 

To gauge whether the difference between simulations and observations 
is unusual over the hiatus period, we first compare observed and simu- 
lated 15-year trends over the entire period 1900-2012 (Fig. 1; see also 
ref. 13). We use the HadCRUT4 observational data set?” and the ‘his- 
torical’ simulations conducted under the auspices of the Coupled Model 
Intercomparison Project Phase 5** (CMIP5), extended for the years 
2006-2012 with the RCP4.5 scenario runs (Extended Data Fig. 1 and 
Extended Data Table 1). The simulation output is subsampled using 
the HadCRUT4 data mask"’, to account for the effects of incomplete 
observational coverage”. 

Figure 1a contains the joint relative frequency distribution of 15-year 
GMST trends across the 114 available CMIP5 simulations, as a func- 
tion of start years since 1900 and trend size. Compared with the CMIP5 
ensemble, observed trends are distributed in no discernibly preferred 
way and occur sometimes at the upper end of the ensemble (for exam- 
ple, for start year 1927 the best-estimate observed trend is larger than 
110 of the 114 simulated trends; Fig. 1b) and sometimes at the lower 
end of the ensemble (for example, for start year 1998 the best-estimate 
observed trend is smaller than all 114 simulated trends; Fig. 1c)*"°”°. 

In both cases depicted in Figs 1b, c, fewer than 5% of the simulations 
lie in one of the tails relative to the observed trend. Hence, if a 5% cri- 
terion for statistical significance were used, one would diagnose formal 
model-observation inconsistency for 15-year trends with start years 
1927 and 1998". But when the comparison is repeated for all start years, 
the rank that the observed trend would have as a member of the 
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a 15-year GMST trend against start year: CMIP5 (colour), HadCRUT4 (black/grey) 


Figure 1 | Simulated and observed 15-year 
GMST trends since 1900. a, Joint relative 
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frequency distribution of GMST trends as a 
function of start year and trend size, based on the 
full 114-member ensemble (in bins of 0.025 °C per 
6 decade, as shown by colour scale). Circles mark 
the observed trend from the HadCRUT4 data set’’. 
4 b, Vertical cross-section of a for start year 1927; 
vertical line marks the observed trend. c, As b, but 
for start year 1998. d, Marginal distribution of 
simulated GMST trend as a function of trend size 
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ensemble of simulated trends*' shows no apparent bias (Fig. 1e), indi- 
cating that the observed and simulated distributions of 15-year trends 
are broadly consistent with each other. Any position of the observed 
trend within the ensemble of simulated trends—including a position 
at or near the margin—is thus dominated by quasi-random effects (al- 
though for any particular start year, a non-negligible contribution from 
systematic errors cannot be excluded). 

The marginal distribution of simulated GMST trends as a function 
of trend size is wider than the observed distribution of trends (Fig. 1d), 
a finding consistent with that from the previous generation of climate 
models”. The width is exaggerated owing to contributions arising at 
three distinct periods. Some simulated trends with start years from 
around 1950-1960 are more strongly negative than any observed trends 
since 1900, and some simulated trends with start years from around 
1960-1970 and from around 1985-1998 are more strongly positive than 
any observed trends since 1900 (Fig. 1a). All three periods (1950-1960, 
1960-1970 and 1985-1998) are influenced by volcanic eruptions (Mount 
Agung in 1963 and Mount Pinatubo in 1991). We speculate that some, 
though not all, models overestimate the cooling induced by an eruption 
and the subsequent warming recovery (see, for example, ref. 12 con- 
cerning a confounding role of El Nifo). 

The mean over all simulated 15-year trends during the period 1900- 
2012 is 0.086 + 0.001 °C per decade (mean + s.e.m.; n = 11,186), in 
excellent agreement with the observed 0.088 + 0.01 °C per decade (n = 
99). Furthermore, of all 11,186 pairwise comparisons that are possible 
between simulated and observed trends, the observed trend is higher in 
53.6% of cases, which is slightly above the 50% expected for a perfectly 
unbiased model ensemble. Figure 1 demonstrates that when viewed over 
the entire period 1900-2012, the 15-year GMST trends simulated by the 
CMIP5 ensemble show no systematic deviation from the observations. 

Our interpretation of Fig. 1 tacitly assumes that the simulated 
multimodel-ensemble spread accurately characterizes internal variabil- 
ity, an assumption shared with other interpretations of the position of 
observed trends relative to simulated trends (for example the reduction 
in Arctic summer sea ice*****). We now test the validity of this assump- 
tion, by identifying deterministic and quasi-random causes of ensemble 
spread. We exploit the availability of a large number of simulations— 
114 realizations with 36 different models, with forcing information 
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available for 75 realizations with 18 different models** (Extended Data 
Figs 1 and 2 and Extended Data Table 1)—and investigate the contri- 
butions of radiative forcing, climate feedback and ocean heat uptake 
to all simulated 15-year and 62-year GMST trends during the period 
1900-2012. 


Energy balance and multiple regression 


Our starting point is the globally averaged energy balance for the sur- 
face layer’ *”. An increasing trend AF in effective radiative forcing (ERF) 
causes an increasing trend AT in GMST. This in turn leads to increased 
outgoing radiation, which in linearized form is written as «AT, where a 
is the climate feedback parameter. Furthermore, the GMST increase leads 
to increased heat transfer from the surface layer to the subsurface ocean, 
written, again in linearized form, as KAT, where x is the ocean heat 
uptake efficiency. The thermal adjustment of the surface layer to AF is 
expected to occur within a few years***’. This means that for timescales 
of one to several decades, the surface energy balance is in quasi-steady 
state and reads 


(a+K)AT=AF 


which produces the energy-balance ‘prediction’ for the GMST trend: 
(1) 


Each CMIP5 model simulates its own ERF time series over the his- 
torical period. These time series were diagnosed previously” >; if mul- 
tiple realizations were available for a model, the ensemble average of 
the individual diagnosed ERF time series for this model was given** and 
is used here. The individual « and x values were previously determined 
for each CMIP5 model from a regression of global top-of-atmosphere 
energy imbalance against GMST****’, in turn based on simulations 
in which the CO, concentration was quadrupled abruptly. The ranges 
of «and x are 0.6-1.8 and 0.45-1.52 Wm ~ °C" |, respectively. That 
and x in the CMIP5 models might vary with time and climate state**” 
is ignored here. There is some positive, though not statistically signifi- 
cant, correlation between « and x (across the 75-member subensemble, 
the correlation is 0.17 with P = 0.14). 


AT=AF/(a+k) 
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Each model’s « value is related to its equilibrium climate sensitivity 


(ECS) by 
ECS = Fy, / 0 (2) 


where F), is the effective radiative forcing from a doubling of the pre- 
industrial atmospheric CO, concentration. The reference value for F>, 
is 3.7 Wm ~ (see, for example, ref. 44), but F,, varies between 2.6 and 
4.3Wm ~ across the CMIP5 ensemble***. To avoid confounding the 
uncertainty in model response with the uncertainty from CO, forcing, 
we use o and not ECS to characterize model response. 

On the basis of the physical foundation of energy balance (equa- 
tion (1)), we determine the extent to which the across-ensemble varia- 
tions of AF, « and x contribute to the ensemble spread of GMST trends 
AT, using the 75-member subensemble of CMIP5 historical simula- 
tions for which radiative forcing information can be obtained from the 
CMIP5 archive*’ (Extended Data Table 1). The presence of internal 
variability is included in our framework by adding a random term to 
equation (1), such that our equation is 


AT=AF/(a+k)+¢é (3) 


Because equation (3) assumes an increasing trend in ERF, its validity 
is somewhat questionable following a volcanic eruption (see, for ex- 
ample, ref. 25). However, Extended Data Fig. 3 shows that overall we 
see a reliable relationship between ERF and GMST trends in the CMIP5 
ensemble, even if the ERF trend is negative. 

We make the connection to multiple linear regression by writing 
each quantity as 


x=x4+x' 


where the overbar marks the ensemble average and the prime the across- 
ensemble variation. Linear expansion of equation (3) thus produces 


x A 1 AF 
AT+AT' = —— + ——AF 5 ot! 
atkK a+K (4+) 
AF Ga 
(@+%) 


This equation holds for each start year separately and suggests the 
regression model 

AT; =Bo+B, AF, + P20, +B3Kj+8,  f=1,...,75 
We thus perform for each start year a multiple linear regression of AT’ 
against AF’, x’ and x’. The regression residual ¢ is interpreted as the 
contribution from internal variability. The complete regression-based 


prediction for GMST trend is obtained by adding the ensemble-mean 
trend to the regression for the across-ensemble variations: 


AT reg = AT + By + B AF, + B04, + Bai}, j=len73 (4) 


where the caret marks the regression estimate. We note that for a model 
that has multiple realizations, the same AF’, «’; or x’; value is counted 
multiple times. The regression is performed separately for each period 
length over which trends are computed. We interpret the ensemble spread 
of the regression result Gere j=1,...,75,as the deterministic spread 
and the spread am j=1,...,75, of the residuals as the quasi-random 


spread. 


Deterministic versus quasi-random spread 

For 15-year GMST trends, deterministic across-ensemble variations are 
smaller than internal variability, as shown by the comparison of the 
regression-based ensemble spread with the regression residuals (Fig. 2b 
and Fig. 2c, respectively). The regression result shows substantial time 
dependence in ensemble spread only for 15-year periods influenced by 
major volcanic eruptions, in particular the Mount Agung eruption in 
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1963 (Fig. 2b; the deterministic ensemble spread is particularly large in 
these periods; see Extended Data Fig. 4a). The distribution of residuals 
shows little time dependence, as evidenced by spread that is similar for 
all start years (Fig. 2c-f). The generally weak time dependence of the 
spread suggests that we can estimate the magnitudes of deterministic 
spread and internal variability from the marginal distributions obtained 
by time-averaging the distributions shown in Fig. 2b and Fig. 2c, re- 
spectively. The 5-95% range is 0.11 °C per decade for the regression 
result and 0.26 °C per decade for the residuals; internal variability thus 
dominates deterministic spread by a factor of 2.5. The dominance of 
internal variability in the ensemble spread of the 15-year GMST trends 
indicates that, viewed over the entire period 1900-2012, no systematic 
model error needs to be invoked when trying to explain differences be- 
tween simulated and observed trends. In particular, the GMST spread 
due to feedback « is not systematically larger than the spread from either 
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Figure 2 | Regression-based and observed 15-year GMST trends since 1900. 


a, Joint relative frequency distribution of regression-based GMST trends 
(equation (4)) as a function of start year and trend size (in bins of 0.025 °C per 
decade, as shown by colour scale), based on the reduced 75-member ensemble 
for which forcing information is available. The thick red line marks the 
ensemble average, the thick black line marks the observed trend and whiskers 
indicate the 5-95% confidence range derived from f. b, Joint relative frequency 
distribution of regression result (equation (4) minus the ensemble-mean 
trend) as a function of start year and trend size (in bins of 0.025 °C per decade). 
The P value of the regression has a median across start years of 0.075, based on 
the null hypothesis that all regression coefficients are zero. c, Joint relative 
frequency distribution of regression residual as a function of start year and 
trend size (in bins of 0.025 °C per decade). d, Vertical cross-section of c for start 
year 1927. e, Vertical cross-section of c for start year 1998. f, Marginal 
distribution of regression residual as a function of trend size, obtained by time- 
averaging the joint distribution in c. All histograms are normalized such 

that their area integral is unity. In a-c, each vertical cross section is normalized, 
and the ordinate ranges are identical. 
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the ERF trend or ocean heat uptake efficiency, and is much smaller than 
the internal variability (Extended Data Fig. 4 and Fig. 2; see also ref. 12). 

For any given start year, the residual spread is very similar to the full 
ensemble spread, implying that we can indeed use the ensemble spread 
as a measure of internal variability (compare Fig. 1b, c with Fig. 2d, e). 
Furthermore, identifying the ensemble spread of the regression resi- 
duals with internal variability allows us to characterize the component 
of observational uncertainty that arises from internal variability (Fig. 2a, f). 
This uncertainty does not concern the construction of the global aver- 
age from individual station data (which has much smaller uncertainty”) 
but relates to the question of whether an observed trend is statistically 
significant (detectable) given serial correlation arising from internal 
variability’. Our model-based estimate of 0.26 °C per decade for the 
5-95% confidence interval for observed 15-year GMST trends is slightly 
larger than the ARS serial-correlation-based estimate for the uncer- 
tainty in the observed GMST trend over the hiatus period (0.2 °C per 
decade; ref. 4). We deem this an acceptable agreement given that the 
estimates were obtained through completely different approaches. We 
further note that the CMIP5 ensemble has been assessed to be generally 
consistent with observed historical decadal variability in GMST®, al- 
though on average it somewhat overestimates the global variability in 
the lower troposphere”. 

For most of the historical period, the entire ensemble of regression- 
based simulated 15-year GMST trends lies within the model-estimated 
5-95% confidence interval of the observations (Fig. 2a). The regres- 
sion-based simulated ensemble partly falls outside this interval during 
the cooling following the Mount Agung eruption and the subsequent 
warming recovery, as well as for start dates after 1990, which include 
the warming recovery following the Mount Pinatubo eruption and the 
surface warming hiatus (Fig. 2a). Because the phases of volcanically 
driven cooling and subsequent warming coincide with larger regres- 
sion spread due to the ERF trend (Extended Data Fig. 4), we speculate 
that the implementation of volcanic forcing requires improvement in 
some climate models. 

The ensemble spread of 62-year GMST trends is dominated by in- 
ternal variability for start years early in the twentieth century, but for 
start years from 1910 onwards, the deterministic spread increases and 
dominates for start years 1920 and later (Fig. 3). The 5-95% range of 
the regression residuals is 0.059 °C per decade, compared with a deter- 
ministic range of 0.032 °C per decade for start year 1900 and 0.093 °C 
per decade for start year 1951. The 5-95% deterministic range for all 
62-year trends is 0.081 °C per decade, which is larger by one-third than 
the 5-95% range from internal variability. Nevertheless, we see a sub- 
stantial influence of internal variability even for GMST trends over 
62 years. 

When observational uncertainty is accounted for—again on the basis 
of the 5-95% confidence interval derived from quasi-random model 
spread—the ensemble-mean simulated 62-year GMST trend is consist- 
ent with the observed trend for all start years after around 1915; before 
that, the simulations tend to warm too little (Fig. 3a). After around 1945, 
the ensemble-mean simulated 62-year trend lies above the observed 
trend, although their difference is smaller than the range of internal 
variability. From around 1925 onward, both the largest and the smal- 
lest individual regression-based simulated trends lie outside the range 
defined by observations plus internal variability and are hence be judged 
to be inconsistent with observations (Fig. 3a). 

The cause of this inconsistency can be traced almost entirely to the 
contribution to the regression by the ERF trend (Fig. 3). By contrast, 
the magnitude of the contributions from « and x is around 0.01 °C per 
decade or less for all start years (Fig. 3e, f). The deterministic ensemble 
spread in 62-year GMST trend is hence dominated by the spread in 
ERF throughout the twentieth century (Fig. 3). 


Discussion 


Viewed over the entire period since 1900, the differences between simu- 
lated and observed 15-year trends in GMST are dominated by internal 
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Figure 3 | Regression-based and observed 62-year GMST trends since 1900. 
a, Joint relative frequency distribution of regression-based GMST trends 
(equation (4); shown by colour scale) as a function of start year and trend size, 
based on the reduced 75-member ensemble for which forcing information is 
available. The thick red line marks the ensemble average, the thick black line 
marks the observed trend and whiskers indicate the 5-95% confidence 

range derived from the marginal distribution of c. b, Joint relative frequency 
distribution of regression result (equation (4) minus the ensemble-mean trend) 
as a function of start year and trend size. All P values of the regression are 
below 0.001, based on the null hypothesis that all regression coefficients 

are zero. ¢, Joint relative frequency distribution of regression residual as a 
function of start year and trend size. d, Joint relative frequency distribution of 
regression contribution from trend in effective radiative forcing. e, Joint relative 
frequency distribution of regression contribution from climate feedback 
parameter «. f, Joint relative frequency distribution of regression contribution 
from ocean heat uptake efficiency x. In all joint relative frequency distributions, 
GMST trend is collected in bins of 0.0125 °C per decade, and each vertical 
cross section is normalized such that its area integral is unity. All ordinate 
ranges are identical. 


variability and hence arise largely by coincidence, with a minor con- 
tribution from volcanic forcing that is sometimes too strong in some 
models (Fig. 2). Furthermore, we confirm, and extend to all 15-year 
radiative forcing trends since 1900, the AR5 assessment for the hiatus 
period? that the CMIP5 models show little systematic bias when com- 
pared with the ARS best-estimate radiative forcing trend*°—despite the 
substantial scatter about the ensemble mean (Extended Data Fig. 2). 

The generally dominant role of internal variability in shaping simu- 
lated 15-year GMST trends implies that internal variability also dom- 
inates the difference between simulations and observations during the 
hiatus period. This conclusion considerably sharpens the relative roles 
of internal variability, forcing error and response error, compared with 
the corresponding ARS assessment’. Although there is no obvious con- 
tribution of forcing bias in the CMIP5 models (Extended Data Fig. 2), 


©2015 Macmillan Publishers Limited. All rights reserved 


the diagnosed radiative forcing is uncertain*’. Hence, our analysis can- 
not rule out a small contribution from a systematic forcing bias'*'*'°*°* 
in the models. In particular, volcanic forcing is estimated to contribute 
to the difference between simulations and observations by up to 15% 
over 1998-2012", with large uncertainty in the magnitude. This is a 
contribution that our method cannot detect. Furthermore, the period 
1998-2012 stands out as the only one during which the HadCRUT4 
15-year GMST trend falls entirely outside the CMIP5 ensemble (if only 
narrowly), suggesting that the CMIP5 models could be missing a cooling 
contribution from the radiative forcing during the hiatus period’*"*"°**"*, 
or that there has been an unusual enhancement of ocean heat uptake 
not simulated by any model”. 

For 62-year GMST trends since 1900, the difference between simu- 
lations and observations is dominated by the spread in the radiative 
forcing trend in the models, with a smaller yet substantial influence of 
internal variability (Fig. 3). Our simple regression-based estimate of 
internal variability in 62-year GMST trends corresponds to a 17-83% 
range of +0.11 °C for the temperature change over six decades, which 
is in excellent agreement with the value of £0.10 °C that has been found 
for the period 1951-2010 using much more sophisticated formal meth- 
ods of detection and attribution”. 

There is scientific, political and public debate regarding the question 
of whether the GMST difference between simulations and observations 
during the hiatus period might be a sign of an equilibrium model re- 
sponse to a given radiative forcing that is systematically too strong, or, 
equivalently, ofa simulated climate feedback « that is systematically too 
small (equation (2)). By contrast, we find no substantive physical or sta- 
tistical connection between simulated climate feedback and simulated 
GMST trends over the hiatus or any other period, for either 15- or 62- 
year trends (Figs 2 and 3 and Extended Data Fig. 4). The role of sim- 
ulated climate feedback in explaining the difference between simulations 
and observations is hence minor or even negligible. By implication, the 
comparison of simulated and observed GMST trends does not permit 
inference about which magnitude of simulated climate feedback—ran- 
ging from 0.6 to 1.8 Wm 7 °C ' in the CMIP5 ensemble—better fits 
the observations. Because observed GMST trends do not allow us to 
distinguish between simulated climate feedbacks that vary by a factor 
of three, the claim that climate models systematically overestimate the 
GMST response to radiative forcing from increasing greenhouse gas 
concentrations seems to be unfounded. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Extended Data Figure 1 | Observed and simulated time series of the (thick red line) and the HadCRUT4” observations (thick black line). All model 


anomalies in annually averaged GMST, from 1900 to 2012. Allanomaliesare _ results have been subsampled using the HadCRUT4 observational data mask"’. 
differences from the 1961-1990 temporal mean of each individual time series. a, 114 realizations from the CMIP5 archive, obtained with 36 different 
GMST is the globally averaged merged surface temperature (2 m height models. b, Subset of 75 realizations with the 18 different models for which 
over land and surface temperature over the ocean). The figure shows single information on ERF is available** (Extended Data Table 1). The two model 
simulations for the CMIP5 models (thin lines), the multimodel ensemble mean _ ensembles are nearly indistinguishable. 
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Extended Data Figure 2 | Time series of trends in ERF, as a function of start 
year. a, 15-year trends; b, 62-year trends. Thin coloured lines show individual 
models as diagnosed previously”; if multiple realizations were available for a 
model, then the ensemble average of the individual diagnosed ERF time series 
for that model was given®’ and is shown here. The thick red line shows the 
ensemble average over all models. The thick black line shows the best estimate 
from AR5**, including, for illustration, the 5-95% uncertainty range for the 
periods 1984-1998 (a) and 1951-2011 (b), taken from fig. 8.19 in ref. 46. These 
uncertainty ranges, both of which are around 0.2 Wm” per decade, do not 
take into account observational biases such as those diagnosed in ref. 48. 
Despite the scatter of the CMIP5 ensemble trends, the ensemble mean is in 
good agreement with the ARS best estimate for almost all start years. The AR5 
best-estimate ERF sums time series of forcing across individual forcing terms. 
Individual time series of AR5 ERF were derived in different ways. Greenhouse 
gas concentrations (observed or inferred), stratospheric aerosol optical depth 


and total solar irradiance were used to derive estimates of radiative forcing 
using simple formulae. Surface albedo forcing was derived from estimated 
anthropogenic vegetation trends. Ozone and aerosol forcings were derived 
from chemical transport model results with aspects of the forcing constrained 
by other modelling approaches or observations, or both. ERF sums rapid 
adjustments with traditional radiative forcings. Most time series in AR5 were 
based on traditional radiative forcings, and only CO, and aerosol forcings 
included an assessment of the rapid adjustment. In other cases ERF and 
radiative forcings were assumed to be the same. The AR5 ERF for the most 
recent 2000-2011 period included updated estimates of volcanic and solar 
forcing, taking into account the broader 2008-2009 solar minimum and 
post-2000 volcanic activity**. These two cooling influences are not included in 
the CMIP5 ERF; it is hence surprising and unexplained why the CMIP5 
ensemble-mean of 15-year ERF trends lies below the best-estimate AR5 ERF 
trend for the latest start years in a. 
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Extended Data Figure 3 | Joint relative frequency distribution asa function decade and 0.025 W m ~ per decade for GMST and ERF trend, respectively. 
of GMST trend and ERF trend, for the reduced 75-member ensemble for The ‘climate resistance’, p, is given by p = a + k (refs 35-37). Each joint 
which forcing information is available and all start years. a, 15-year trends; _ distribution is normalized such that its area integral is unity. Note the different 
bin sizes are 0.025 °C per decade and 0.05 W m ” per decade for GMST axes, reflecting the much tighter correlation of the 62-year trends. 
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Extended Data Figure 4 | Regression-based 15-year GMST trends since 
1900. a, Joint relative frequency distribution of regression result (equation (4) 
minus the ensemble-mean trend) as a function of start year and trend size. 
The P values of the regression have a median across start years of 0.075, 
based on the null hypothesis that all regression coefficients are zero. b, Joint 
relative frequency distribution of regression contribution from the trend in 


ERF. ¢, Joint relative frequency distribution of regression contribution from the 
climate feedback parameter «. d, Joint relative frequency distribution of 
regression contribution from the ocean heat uptake efficiency x. In all 

joint relative frequency distributions, GMST trend is collected in bins of 
0.025 °C per decade, and each vertical cross section is normalized such that 
its area integral is unity. 
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Model name 
ACCESS1-0 
ACCESS1-3 
bec-csm1-1 
bec-csm1-1-m 
BNU-ESM 
CanESM2 
CCSM4 
CESM1-BGC 
CESM1-CAM5 
CMCC-CM 
CMCC-CMS 
CNRM-CM5 
CSIRO-Mk3-6-0 
FIO-ESM 
GFDL-CM3 
GFDL-ESM2G 
GFDL-ESM2M 
GISS-E2-H 
GISS-E2-H-CC 
GISS-E2-R 
GISS-E2-R-CC 
HadCM3 
HadGEM2-AO 
HadGEM2-CC 
HadGEM2-ESs 
IPSL-CM5A-LR 
IPSL-CM5A-MR 
IPSL-CM5B-LR 
MIROC5 
MIROC-ESM 
MIROC-ESM- 
MPI-ESM-LR 
MPI-ESM-MR 
MRI-CGCM3 
NorESM1-M 


Extended Data Table 1 | CMIP5 models used in this study 
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The originating institutions and publications documenting the models are listed comprehensively in 


table 9.A1 of ref. 5. 
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An Arabidopsis gene regulatory network 
for secondary cell wall synthesis 
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Taylor-Teeples*, L. Lin*+*, M. de Lucas!?*, G. Turco’?, T. W. Toal!?, A. Gaudinier’?, N. F. Young®, G. M. Trabucco’, 

T. Veling®, R. Lamothe’, P. P. Handakumbura’, G. Xiong*, C. Wang', J. Corwin®, A. Tsoukalas*®, L. Zhang’, D. Ware”*®, 
Pauly’, D. J. Kliebenstein?, K. Dehesh!, I. Tagkopoulos™®, G. Breton’¥, J. L. Pruneda-Paz’, S. E. Ahnert!®, S. A. Kay't; 
P. 


The plant cell wall is an important factor for determining cell shape, function and response to the environment. Sec- 
ondary cell walls, such as those found in xylem, are composed of cellulose, hemicelluloses and lignin and account for the 
bulk of plant biomass. The coordination between transcriptional regulation of synthesis for each polymer is complex and 
vital to cell function. A regulatory hierarchy of developmental switches has been proposed, although the full com- 
plement of regulators remains unknown. Here we present a protein-DNA network between Arabidopsis thaliana 
transcription factors and secondary cell wall metabolic genes with gene expression regulated by a series of feed-forward 
loops. This model allowed us to develop and validate new hypotheses about secondary wall gene regulation under abiotic 
stress. Distinct stresses are able to perturb targeted genes to potentially promote functional adaptation. These interactions 
will serve as a foundation for understanding the regulation of a complex, integral plant component. 


Plant cell shape and function are in large part determined by the cell 
wall. Almost all cells have a primary wall surrounding the plasma mem- 
brane. Specialized cell types differentiate by depositing a secondary cell 
wall upon cessation of cell elongation. In addition to providing mech- 
anical support for water transport and a barrier against invading path- 
ogens, the polymers contained within the wall are an important renewable 
resource for humans as dietary fibre, as raw material for paper and pulp 
manufacturing, and as a potential feedstock for biofuel production. 
Secondary cell walls account for the bulk of renewable plant biomass 
available globally. 

The secondary cell wall consists of three types of polymer—cellulose, 
hemicelluloses and lignin—and is found in xylem, fibres and anther 
cells. Cellulose microfibrils form a main load-bearing network. Hemi- 
celluloses include xylans, glucans, and mannans. Lignin is a complex 
phenylpropanoid polymer that imparts ‘waterproofing’ capacity as well 
as mechanical strength, rigidity and environmental protection. Despite 
the importance of the plant secondary cell wall, our knowledge of the 
precise regulatory mechanisms that give rise to these metabolites is 
limited. The expression of cell wall-associated genes is tightly spatio- 
temporally co-regulated’*. However, the pervasive functional redund- 
ancy within transcription factor families, the combinatorial complexity 
of regulation, and activity in a small number of cell types render func- 
tional characterization from single gene experiments difficult. A model 
of master regulators has been proposed with NAC domain and homeo- 
box HD-ZIP Class III (HD-ZIPIII) transcription factors initiating cell 
specification and secondary cell wall synthesis in Arabidopsis thaliana. 
In this model, VASCULAR-RELATED NAC DOMAIN6 (VND6) and 
VND7 are sufficient but not necessary to regulate xylem vessel forma- 
tion; additionally, the HD-ZIPIII transcription factor PHABULOSA 


(PHB) also regulates vessel formation, and acts in a highly redundant 
manner with four other HD-ZIPIII factors’. In anthers, two NAC do- 
main transcription factors, NAC SECONDARY WALL THICKENINGI 
(NST1) and NST2, are sufficient to drive the secondary cell wall bio- 
synthetic program, but act redundantly*. Thus, regulation of this process 
is highly redundant and combinatorial. However, no comprehensive 
map of interactions has been developed at cell-type-resolution over time, 
nor have upstream regulators been identified. We therefore chose to 
pursue a network-based approach to comprehensively characterize the 
transcriptional regulation of secondary cell wall biosynthesis. 


Mapping the secondary cell wall synthesis regulatory 
network 


To systematically map this regulatory network at cell-type-resolution, 
we used a combination of high-spatial-resolution gene expression data” 
and the literature’ to identify fifty genes implicated in xylem cell spec- 
ification. These included transcription factors and enzymes involved in 
cellulose, hemicellulose and lignin biosynthesis that are expressed in 
root xylem cells (Supplementary Table 1; Methods). Selection of both 
developmental regulators and downstream functional genes allowed us 
to interrogate upstream regulatory events that determine xylem speci- 
fication and differentiation associated with secondary cell wall synthe- 
sis. Promoter sequences were screened using an enhanced yeast one 
hybrid (Y1H) assay against 467 (89%) of root-xylem-expressed tran- 
scription factors’. Protein interactions were identified for 45 of the pro- 
moters (Supplementary Table 2). The final network comprises 242 genes 
and 617 protein-DNA interactions (Fig. 1a; http://gturco.github.io/ 
trenzalore/stress_network). Thirteen of the transcription factors have 
been previously identified as having a role in xylem development or 
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Figure 1 | Regulators of xylem development and secondary cell wall 
biosynthesis. a, Gene regulatory network for secondary cell wall biosynthesis 
in Arabidopsis root xylem. Nodes, transcription factors or promoters; edges, 


secondary cell wall biosynthesis. Six of the transcription factors were 
previously shown to bind to these promoters and a further nine of the 
protein-DNA interactions were implied in gene expression studies, that 
is, without demonstrating direct binding®*’. These interactions rep- 
resent independent validation of our approach (Supplementary Table 2; 
Extended Data Fig. 1). All together, the network contains 601 novel in- 
teractions, although false negatives and false positives are a component 
of all network approaches”. 

Our Y1H approach revealed a highly interconnected regulatory net- 
work. On average, each cell wall gene promoter was bound by 5 tran- 
scription factors from 35 protein families with over-representation of 
AP2-EREBP, bHLH, C2H2, C2C2-GATA and GRAS gene families (Sup- 
plementary Table 3). Our network now adds an additional layer of gene 
regulation with novel factors upstream of VND6 and VND7 and sup- 
ports feed-forward loops”'»”’ as an overarching theme for regulation of 
this developmental process with a total of 96 such loops (Fig. 1a, b). 

To organize the network, we employed a power graph compression 
approach to condense the network into overlapping node sets with sim- 
ilar connectivity. Protein-DNA interactions (edges) between proteins 
and promoters (nodes) in the original network were replaced by ‘power 
edges’ between overlapping ‘power nodes’. A power edge exists between 
suites of transcription factors that bind to the same set of promoters. 
Using this approach, 24 power edges were observed (Supplementary 
Table 4; Fig. 1c). Some sets could be distinguished on the basis of target 
gene function. For instance, one power edge connects 16 transcription 
factors with promoters of two lignin genes, 4CL1 and HCT, while an- 
other power edge connects three transcription factors with genes related 
to cellulose and hemicellulose biosynthesis such as CESA4, CESA7, 
IRX9, COBL4 and GUX2. 


Testing interactions predicted by the network 


Using our network, we hypothesized that E2Fc is a key upstream reg- 
ulator of VND6, VND7 and secondary cell wall biosynthesis genes. This 
hypothesis is based on our finding that E2Fc bound to 23 promoters in- 
cluding those of VND6, VND7 and MYB46, and cellulose-, hemicellulose- 
and lignin-associated genes (Fig. 2a). VND7 and MYB46 are also known 
to bind to the promoters of many of these genes as well”*”*, creating 
a suite of feed-forward loops. E2Fc is a known negative regulator of 
endoreduplication'*””. Before terminally differentiating, xylem cells elon- 
gate and likely undergo endoreduplication before secondary cell wall 
deposition. E2Fc can act as a transcriptional repressor'*"* as well as 
a transcriptional activator’ and here we report both. E2Fc acti- 
vated VND7 expression in a dose-dependent manner (Fig. 2b and Ex- 
tended Data Fig. 2a, b) in transient assays, but not in the presence of 
RETINOBLASTOMA-RELATED (RBR) protein, as is typical of E2F 


572 | NATURE | VOL 517 | 29 JANUARY 2015 


MYB63 MYB20 


At1g64620 


IDD1 NF-YB2 
DREB26 DAG1 


protein-DNA interactions. Edges in feed-forward loops are red. b, A sample 
feed-forward loop in red. c, ‘Power edges’ between node sets. d, The secondary 
wall network from sub-fragments of cell wall promoters. 


transcription factors (Extended Data Fig. 2c). In an E2Fc-overexpressor 
line with the amino terminus deleted to overcome post-translational 
degradation’*”’, regulation of VND7 expression by extremely high or 
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Figure 2 | E2Fc represses secondary cell wall gene biosynthesis. a, E2Fc- 
DNA interactions. Solid edges, Y1H; dashed edges, literature. b, Bright field 
(top) and dark-field (bottom) of representative leaves (n = 20) expressing 
VND7::LUC or together with 35S::E2Fc in 1:0.1, 1:1, 1:2, 1:5, and 1:10 ratios, 
respectively. c, VND6 and VND7 expression relative to UBC10 control in an 
E2Fc RNA interference (RNAi) line relative to wild type. n = 2 biological 
replicates with 3 technical replicates. d, e, Phloroglucinol staining of lignin 
(n = 6 per genotype (Col-0 and E2Fc RNAi, respectively), representative 
images shown) (d) and crystalline cellulose in wild-type and E2Fc-knockdown 
roots (1 = 3 pooled samples for each genotype, each pooled sample has 
approximately 1,000 individuals) (e). AIR, alcohol-insoluble residues. For all 
panels, *P < 0.05 from Student’s t-test and data are means + s.d. 
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Figure 3 | Tissue-specific VND7 regulation and VND7 targets. a, REV and 
PHB expression relative to -tubulin control following dexamethasone 
treatment of 35S::VND7:VP16:GR relative to untreated. n = 3 biological 
replicates; * significantly different from f, ¢ significantly different from }, and * 
significantly different from {, P< 0.01. b, PAL4 expression relative to 
AT5G15710 control in rev-5 relative to wild-type. c, PAL4 expression relative 
to UBC21 control following one hour dexamethasone (Dex) treatment of 
35S:REV:GR relative to untreated. *P < 0.05 for panels b and c, n = 2 biological 
replicates with 3 technical replicates. All panels show data as means = s.d., 
with P calculated from Student's t-test. 


low E2Fc levels resulted in VND7 repression, whereas moderate E2Fc 
levels resulted in VND7 activation (Extended Data Fig. 2b). This dynamic 
regulation was also observed in an E2Fc-knockdown line’*, where 
transcript abundance of VND6 and VND7 was significantly increased 
(Fig. 2c). Based on our results, we propose that E2Fc acts in a complex, 
concentration-dependent manner to regulate gene expression either 
as an activator or a repressor. Coincident with the repression observed 
in E2Fc-knockdown lines, ectopic patches of lignin were observed near 
the root-shoot junction using phloroglucinol staining (Fig. 2d). A sig- 
nificant increase in crystalline cellulose in the knockdown line was 
observed using an Updegraff assay (Fig. 2e). 

All five HD-ZIPIII transcription factors, including REVOLUTA 
(REV), PHB, and PHAVOLUTA are jointly necessary for xylem cell 
specification and secondary wall synthesis’. We found that VND7 bound 
REV and PHB promoters in yeast. VND7 has been to shown to act as a 
transcriptional activator? or as a repressor when complexed with VNI2™. 
With a dexamethasone-inducible version of VND7”**, transcript levels 
of REV and PHB were significantly decreased by 2.5-fold following in- 
duction (Fig. 3a). The REV transcription factor bound to the promoter 
of the lignin biosynthesis gene PHENYLALANINE AMMONIA LYASE4 
(PAL4). Ina rev-5 loss-of-function mutant, PAL4 significantly increased 
in transcript abundance (Fig. 3b) and transient induction of REV by a 
glucocorticoid receptor fusion” resulted in a decrease of PAL4 express- 
ion (Fig. 3c). Taken together, these data suggest that E2Fc can activate 
VND7 expression in a dose-dependent manner, while VND7, possibly 
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in concert with VNI2, can repress REV expression, and REV can repress 
expression of PAL4. This series of interactions predicted by the network 
model and tested by perturbation analyses ensures that activation of 
VND7 and coordination of lignin biosynthesis is tightly regulated. 

We next sought to identify all transcription factors that potentially 
regulate secondary cell wall biosynthesis genes, not just in root xylem 
cells but also in above-ground cell types including xylary fibres, inter- 
fascicular fibres and anthers. Many of the biosynthetic genes down- 
stream of the key NAC domain transcription factors act in both the root 
and the shoot”. To expand the network, we used Y1H to screen mul- 
tiple smaller promoter fragments of a subset of promoters included in 
the root xylem network, including genes associated with cellulose, hemi- 
cellulose and lignin biosynthesis against a library of 1,664 full-length 
Arabidopsis transcription factors (Supplementary Tables 5, 6). We ob- 
served a total of 413 interactions that included proteins from 36 of the 
75 protein families tested (Supplementary Table 7; Fig. 1d; http://gturco. 
github.io/trenzalore/secondary_cell_wall). We found an over-representation 
of AP2-EREBP, bZip, ZF-HD, MYB and GeBP families (Supplementary 
Table 8). Each promoter interacted with an average of 38 different pro- 
teins, generating even more possibilities for combinatorial, redundant 
or condition-specific gene regulation. Like the root xylem network, pre- 
viously reported protein-DNA interactions were observed in this screen 
including MYB46 and MYB83 binding the promoters of CESA genes 
(Supplementary Table 7)*”’. Since most of these interactions were novel, 
a subset was additionally validated. Transient expression of AIL1, MYB83, 
MYB54, NAC92, NST2 and SND1 caused a significant increase in 
CESA4::LUC activity in tobacco (measured by luciferase activity), indi- 
cating binding and activation of the CESA4 promoter (Fig. 4a). We fur- 
ther tested three regions of the CESA4 promoter with two NAC family 
proteins, SND1 and NST2 (Fig. 4b, c), using an in vitro electrophoretic 
mobility shift assay (EMSA). Extracts of Escherichia coli expressing either 
glutathione-S-transferase-conjugated NST2 (GST:NST2) or GST:SND1 
in the presence of a CESA4-2pr promoter probe produced DNA species 
with retarded mobility (Fig. 4b, c). We also observed binding of the 
CESA7, CESA8 and KOR promoter fragments with the NST2 protein 
and CESA8 with the SND1 protein (Extended Data Fig. 3). These in- 
teractions between NST2 and CESA4, CESA8, and KOR promoters were 
further confirmed in planta by chromatin immunoprecipitation (ChIP). 
An antibody to green fluorescent protein (GFP) was used to immuno- 
precipitate NST2 protein from extracts of 35S::NST2::GFP plants. The 
complex was significantly enriched for fragments from the CESA4, CESA8 
and KOR promoters (Fig. 4d). The tracheary element-regulating cis- 
element (TERE, CTTNAAAGCNA) is a direct target of VND6*”’. A 
perfect TERE is present in the CESA4 promoter (CTTGAAAGCTA) 
and TERE-like sequences are present in CESA8 (CTTCAATGTTA) and 
KOR (CTTGAAAATGA). Taken together, these data clearly demon- 
strate that the expression of CESA4 and other secondary cell wall genes 
is mediated by the direct binding of the NAC-domain binding tran- 
scription factors NST2 and SND1 to the target gene promoters via the 
TERE. 


Abiotic stress can co-opt the xylem regulatory network 


Having generated a gene regulatory network supported by in vivo and 
in vitro approaches, we sought to test if the model could allow us to 
predict responses under abiotic stress perturbation. Co-opting a devel- 
opmental regulatory network is likely a key mechanism to facilitate ad- 
aptation in response to stress. Thus, we hypothesized that stress responses 
are likely integrated into the gene regulatory network that determines 
xylem cell specification and differentiation and that we can predict the 
exact genes that these stresses manipulate within our network. 

We first identified genes within the network whose expression was 
altered specifically in the root vasculature in response to salt, sulphur, 
iron and pH stress*®*? and nitrogen influx**. Genes within the root xylem 
secondary cell wall network were significantly differentially regulated 
in response to sulphur stress, salt stress and iron deprivation (Sup- 
plementary Table 9). Substantial overlap was observed between iron 
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Figure 4 | Multiple transcription factors bind the CESA4 promoter. 
a, Activation of CESA4::LUC by transcription factors in tobacco (n = 5). 
*P < (0.05 based on Student’s t-test. Data are means = s.d. b, c, EMSA with 


deprivation and salt stress gene responses and was further character- 
ized (Fig. 5a). We filtered the xylem network to include only genes dif- 
ferentially expressed in salt or iron, creating stress-specific sub-networks 
(Extended Data Fig. 4). Previously, we determined that key develop- 
mental transcription factors have significantly more upstream regulators 
compared to other genes”’. In response to iron deprivation, REV has the 
most upstream regulators, while in response to salt stress, VND7 and 
MYB46 have the most upstream regulators. 

On the basis of these data from the iron-deprivation sub-network, 
we hypothesized that REV plays a key role in regulating secondary cell 
wall development in response to iron deprivation. To additionally 
determine directionality and sign (activation or repression) in the net- 
work, we constructed a network of 16 key nodes using the consensus 
network from four unsupervised and one supervised network infer- 
ence method. REV was also predicted to be an important regulator of 
lignin biosynthesis gene expression in response to iron deprivation 
using these methods (Extended Data Fig. 5). First, to test the model- 
generated prediction that lignin biosynthetic gene expression is altered 
in response to iron deprivation, we measured phenylpropanoid-related 
gene expression. An increase in 4CL1, PAL4 and HCT gene expression 
was observed (Fig. 5b). Additionally, iron deprivation stress altered the 
timing and spatial distribution of the 4CL1 transcript (Fig. 5c). These 
expression changes are accompanied by an increase in fuchsin stain- 
ing, indicative of increased phenylpropanoid deposition (Extended 
Data Fig. 6b). Expression in a rev-5 loss-of-function mutant in iron- 
deficient conditions revealed a REV- and stress-dependent influence 
on CCoAOMT1, PAL4 and HCT expression (Fig. 5d), thus validating 
our model predictions. 

In the high-salinity sub-network VND7 and MYB46 contain the most 
upstream regulators (Extended Data Fig. 4). VND7 and MYB46 expres- 
sion is greatly increased in roots in response to salt stress, but lignin 
biosynthetic gene expression is unaltered (Fig. 5e; Extended Data Fig. 6a). 
In corroboration with this hypothesis, the network model constructed 
using the described in silico methods also predicts VND7 and MYB46 
as main regulators in response to salt stress but not iron deprivation 
(Extended Data Fig. 7), and indeed this was observed with an expan- 
sion of the domain of VND7 expression after salt treatment but not 
iron deprivation (Fig. 5e, f; Extended Data Fig. 6c). In conjunction with 
this ectopic increase, we observed an additional strand of metaxylem in 
roots exposed to high salinity (Fig. 5g). 


Discussion 


Owing to functional redundancy among regulators of secondary cell 
wall biosynthesis, transcription factors have largely eluded identification 
by loss-of-function genetic screens. Our network approach has iden- 
tified hundreds of novel regulators and provided considerable insight 
into the developmental regulation of xylem cell differentiation. The 
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network, which includes a cell cycle regulator, is comprised of many 
feed-forward loops that are likely to ensure robust regulation of this 
process (Fig. 5h). Accordingly, we revealed that perturbation at distinct 
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Figure 5 | The xylem-specific gene regulatory network is responsive to high 
salinity and iron deprivation. a, Network genes responsive to high salinity 
and/or iron deprivation. b, VND7, HCT, 4CL1, PAL4 expression after iron 
deprivation. c, 4CL1::GFP expression after iron deprivation (representative 
images shown, n = 4 per line). d, Lignin gene expression after iron deprivation 
in rev-5. *P = 0.01; **P = 0.001; +P S 0.0001; P values from ANOVA. 

e, VND7, HCT, 4CL1, PAL4 expression after NaCl. b, d, e, Expression relative to 
UBC10 and PP2AA3 controls. n = 2 biological replicates with 3 technical 
replicates. b, e, *P = 0.01 based on Student’s t-test and data are means + s.d. 
f, g, Representative images of VND7::YFP (n = 5) (f) and fuchsin-staining 

(n = 5) (g) after NaCl. Arrows, non-stele cells (f) and extra metaxylem 
strand (g). h, Proposed regulation of secondary wall biosynthesis. 
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nodes changes the network subtly, including phenylpropanoid bio- 
synthesis in response to iron deprivation and ectopic xylem cell dif- 
ferentiation in response to salt stress (Fig. 5h). We anticipate that these 
findings will be instrumental in biotechnology and in our understand- 
ing of cell fate acquisition. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Yeast one-hybrid (Y1H) protein-DNA interaction assays. The root vascular- 
expressed transcription factor collection is described in ref. 7. The 1,663 transcription 
factor collection was assembled primarily from clones deposited in the Arabidopsis 
Biological Resource Center by various collaborative projects including the Peking- 
Yale Consortium™, REGIA*, TIGR®, and the SSP Consortium”. Translational fusions 
to the GAL4 activation domain were generated as described in ref. 38. A total of 
1,663 E. coli strains harbouring different Arabidopsis transcription factors (Sup- 
plementary Table 5) were arrayed in 96-well plates and plasmids were prepared 
using the Promega Wizard SV 96 plasmid purification DNA system according to 
manufacturer's recommendations. 

Root secondary cell wall gene promoters (2-3 kb of upstream regulatory region 
from the gene’s translational start site, or the next gene, whichever comes first) were 
cloned and recombined with reporter genes according to ref. 33. Promoter sequences 
and primers used are described in Supplementary Table 1. AT1G30490, AT5G60690, 
AT2G34710, AT1G71930, AT1G62990 promoter sequences and primers are des- 
cribed in ref. 33, while the promoter sequences and primers for AT5G15630 are 
described in ref. 5. For dissection of cell wall biosynthesis promoters, approximately 
1,000 bp of sequence upstream of the translational start site was tested for interac- 
tions with the transcription factor library. Three overlapping fragments of approx- 
imately equal and average size of 419 bp were independently cloned for each promoter 
according to ref. 38. The oligonucleotides used to amplify promoter fragments 
and details of their coordinates for 4CL1 (Atlg51680), CESA4/IRX5 (At5g44030), 
CESA7/IRX3 (At5g17420), CESA8/IRX1 (At4g18780), COBL4/IRX6 (At5g15630), 
HCT (At5g48930), IRX9 (At1g27600), IRX14 (At4g36890), KOR/IRX2 (At5g49720), 
LAC4/IRX12 (At2g38080), and REF8 (At2g40890) are described in Supplemen- 
tary Table 6. 

Root bait promoters were screened against the stele-expressed transcription fac- 
tor collection using the Y1H protocol as previously described’. The 1,663 transcrip- 
tion factor library was transformed into each yeast strain and the B-galactosidase 
activity was determined as described in ref. 38, but in 384-well plates. Positive in- 
teractions were visually identified as incidence of yellow caused by the presence of 
ortho-nitrophenyl cleavage from colourless ortho-nitrophenyl-B-p-galactoside by 
B-galactosidase. The DNA bait strains were similarly tested for self-activation be- 
fore screening by not transforming with prey vectors in the presence of thiamine. 
All interacting transcription factors were assembled into a cell wall interaction li- 
brary and the screen was repeated to confirm the results and each clone was sequenced 
to reconfirm identity. 

Statistical analysis for protein family enrichment. Enrichment was determined 
using the hypergeometric distribution online tool (http://stattrek.com/). The pop- 
ulation size is the number of transcription factors in the xylem transcription factor 
collection while the successes within the population is the number of transcription 
factors within that transcription factor family in the xylem. The number of suc- 
cesses in the sample was the number of proteins belonging to that family, and the 
number in the sample is the total number of transcription factors within the net- 
work. The A. thaliana transcription factor list is as described in ref. 7. 

Power graph compression approach. The power graph compression was per- 
formed using the algorithm as previously described”. 

Plant material. The E2Fc RNAi line is described in ref. 23 and was verified by quan- 
tifying E2Fc transcript abundance relative to the Col-0 control using an E2Fc primer 
compared to an ACTIN control primer (Supplementary Table 1). VND7::YFP lines 
are described in ref. 39. The VND7 glucocorticoid induction line is described in 
ref. 9. The rev-5 loss-of-function mutant was described in ref. 40. 

Cloning and insertion of the 4CL1 promoter into a pENTR p4-p1R donor vector 
was performed according to ref. 33 (for sequence, see Supplementary Table 1). The 
promoter was then recombined into binary vector pK7m24GW,3 along with 
pENTR 221 ER-GFP::NOS. The resulting 4CL1::GFP vector was transformed into 
Agrobacterium strain GB3101. Col-0 plants were then transformed using the floral 
dip method. 

Plant growth conditions. All plants were grown vertically on plates containing 
1X Murashige and Skoog salt mixture, 1% sucrose, and 2.3 mM 2-(N-morpholino) 
ethanesulphonic acid (pH 5.8) in 1% agar. NaCl plates were made by adding 140 mM 
NaCl to this standard media. Iron control and deprivation media were made accord- 
ing to ref. 30. Plants grown on stress media (iron or salt) were first germinated on 
nylon mesh placed over control media for four days before transferring mesh with 
seedlings to iron deprivation or NaCl plates. Plants used for RNA isolation were 
also grown on nylon mesh placed over the agar to facilitate the collection of root 
material’. 

Determination of crystalline cellulose. Roots of 7-day-old plants were harvested 
and lyophilized. Six to ten plates of seedlings grown at the same time on the same 
media were pooled to make a single biological replicate. Crystalline cellulose was 
measured according to ref. 41. After hydrolysis of non-cellulosic polysaccharides 
from an alcohol insoluble residue wall preparation with the Updegraff reagent 


(acetic acid:nitric acids:water, 8:1:2 v/v), the remaining pellet was hydrolysed in 
72% sulfuric acid. The resulting glucose quantity was determined by the anthrone 
method”. 

Phloroglucinol staining. Five day after imbibition seedlings to be stained with 
phloroglucinol were fixed in a 3:1 95% ethanol:glacial acetic acid solution for 
5 min. Samples were then transferred to a solution of 1% phloroglucinol in 50% HCl 
for 1-2 min. Whole seedlings were then mounted in 50% glycerol on slides and 
viewed using an Olympus Vanox microscope. Images were captured with a PIXERA 
Pro-600ES camera. 

Confocal laser scanning microscopy. Confocal laser scanning microscopy was 
carried out on a Zeiss LSM700. Cell walls were stained using propidium iodide as 
previously described”. 

Transient protein-DNA interaction detection in tobacco. /-glucuronidase. For 
transient transactivation expression assays, the VND7, GAL4, and/or CyclinB1 pro- 
moters were cloned into pGWB3 to generate GUS (f-glucuronidase gene) fusion 
reporters for E2Fc transcriptional activity. The E2Fc effector vector® (in PYL436) 
was provided by S. D. Kumar (UC Davis, CA). The effector and reporter constructs 
were transformed into Agrobacterium tumefaciens strain GV3101 and co-infiltrated 
with the p19 silencing inhibitor into 3-weeks-old Nicotiana benthamiana leaves at 
A60o0nm 0.6:0:6:1, respectively. Leaves were harvested 3 days after agro-infiltration 
and homogenized in GUS extraction buffer (50 mM Na,PO, pH 7, 10 mM Na2- 
EDTA, 0.1% SDS, 0.1% Triton TX-100 and 10 mM B-mercaptoethanol). Quantitative 
MUG fluorescent assay for GUS determination was performed using 100 1g of 
protein/sample in 500 pl of GUS assay buffer (1 mM 4-methyl umbelliferyl B-D- 
glucuronide, SIGMA, in Extraction Buffer). Samples were covered in aluminium foil 
and incubated at 37 °C. Reaction was stopped at different time points by transferring 
50 pl to a tube with 450 il of Stop Buffer (0.2 M Na,CO3). 4-methylumbelliferone 
fluorescence was determined using a Infinite 200 Pro-series reader (excitation at 
365 nm, emission at 455 nm). 

Luciferase (Fig. 2). Overnight cultures of Agrobacterium (GV3101, Deoo nm = 0.6) 
carrying VND7 promoter fused to luciferase (LUC) and 35S::E2Fc were prepared 
in infiltration medium (2 mM Na3PO,, 50 mM MES, 0.5% glucose, 100 [1M acet- 
osyringone) at Deéoonm = 0.1. Subsequently, cultures containing VND7::LUC and 
358::E2Fc at respective ratios of 1:0, 1:0.5, 1:1, 1:2, 1:5, and 1:10 were spot-infiltrated 
into 6-7-week-old Nicotiana benthamiana leaves. To prevent gene silencing, Agro- 
bacterium strain carrying the pBIN19 suppressor from tomato bushy stunt virus 
was included in each of the combinations“. The LUC activity was inspected at 72 to 
96h post infiltration using CCD camera (Andor Technology). 

Luciferase imaging of VND7::LUC was performed as previously described with 
modifications’. Briefly, tobacco leaves were cut off after 3 days of transient trans- 
formation and sprayed with 1 mM luciferin (Promega) in 0.01% Tween-80, then 
were imaged using an Andor DU434-BV CCD camera (Andor Technology). Images 
were acquired every 10 min for 12 pictures. Luciferase activity was quantified for a 
defined area as mean counts per pixel per exposure time using Andor Solis image 
analysis software (Andor Technology). Statistical analyses were performed using 
two-tailed Student’s t-tests. The difference was considered significant if P< 0.05. 
Luciferase (Fig. 4). A vector system was created to generate a single vector with the 
CaMV 35S constitutive promoter (35S) fused to a transcription factor, a promoter 
fragment fused to the firefly luciferase reporter gene, and 35S fused to the Renilla 
luciferase reporter gene. The constitutively expressed Renilla gene served as a con- 
trol to normalize for transformation efficiency. This system includes one destina- 
tion vector phAH-LARm and three entry vectors phAH-TF, pLAH-PROM and 
pLAH-VP6435T using MultiSite Gateway Pro Technology (Invitrogen) to simul- 
taneously clone three DNA fragments (Extended Data Fig. 8). To develop the ex- 
pression vector, promoter fragments and transcription factors were cloned, using 
the BP system (Invitrogen), into pPDONR-P3-P2 and pDONR-P1-P4 to create pLAH- 
TF and pLAH-Prom, respectively. PacI-digested pMDC32 was ligated with the 
2.427 kb pFLASH fragment following HindIII and Sacl digestion to yield phAR-L 
with the firefly luciferase (LUC) reporter gene. The 3 kb pRTL2-Renilla HindIII- 
digested fragment was inserted into SacI-digested pLAH-L to create pLAR-LR with 
both firefly LUC and Renilla luciferase (REN) genes. To generate pLAH-LAR, a 
Spel-digested PCR fragment containing the AmpR gene amplified from pDEST22 
was ligated with Spel-digested pLAR-LR. To add the minimal CaMV 35S fragment 
(Mini35S) before the LUC reporter gene, the gateway cassette ccdB/CmR of pLAR- 
LAR was replaced by a HindIII-digested PCR fragment Mini35S-ccdB-CmR amp- 
lified from pMDC32 using specific primer pHindIII-Rv and primer Mini35S-attR2. 
The final destination vector is referred to as phAH-LARm. 

The protein coding regions of select transcription factor genes were amplified. 
Each amplified fragment was recombined with pDONR-P1-P4 vector by perform- 
ing BP reactions to produce pLAH-TF. Target promoter fragments were amplified 
from A. thaliana genomic DNA using appropriate primers with attB3 and attB2 sites 
(Supplementary Table 10). Each amplified fragment was cloned into pDONR-P3- 
P2 vector by performing BP reactions to produce pLAH-PROM. A third pDONR 
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vector (pLAH-VP64Ter) was designed to create a carboxy-terminal fusion of the 
strong transcription activation domain VP64 to the transcription factor followed 
by the 35S transcription terminator (35St). A PCR fragments containing VP64 re- 
gion and 35S terminator was amplified from pB7-V P64 using specific primers with 
attB4r and attB3r sites (Supplementary Table 10) into pDONR P4r-P3r to produce 
pLAH-VP6435T. Finally, the fully functional expression vector was generated by 
Gateway LR cloning of destination vector and the three entry clones: pLhLAH-LARm, 
pLAH-TF, and pLAH-VP64Ter (Extended Data Fig. 7). 

Agrobacterium tumefaciens strain GV3103 (MP90) carrying expression con- 

structs were grown in Luria-Bertani media with rifampicin and ampicillin and sus- 
pended in infiltration buffer 10 mM MES, pH 5.7, containing 10 mM MgCl, and 
150 UM acetosyringone. The cultures were adjusted to a Déoonm of 0.8 and incu- 
bated at room temperature for at least 3 h before infiltration. The cultures were hand 
infiltrated using a 1 ml syringe into 3- to 4-week-old N. benthamiana leaves. Leaf 
samples were harvested 36h after infiltration and assayed for luciferase activity 
according to manufacturer instructions using the Dual-Luciferase Reporter Assay 
Systems (Promega). Approximately 100 mg of tissue was frozen in liquid nitrogen 
and homogenized using a Retsch Mixer Mill MM400 for 1 min at 30 Hz. Ground 
tissue was then thawed in lysis buffer (0.1 M HEPES, pH 7.8, 1% Triton X-100, 
1mM CaCl2 and 1 mM MgCl,) at 25 °C for 15 min. Then 50 il of Luciferase Assay 
Reagent II was added to 10 pl aliquots of the lysates to measure firefly luciferase 
activity, 1,000 ms integration time, using a Spectra Max M5/M5e plate reader to 
measure total light emission. Firefly luciferase activity was quenched with 50 kl of 
Stop & Glo Reagent, which contains Renilla luciferin substrate, also measured, 
100 ms integration time, as total light emission. An expression vector containing 
part of the coding sequence (+ X/+Y) of the B-glucuronidase reporter gene rather 
than a transcription factor gene was used for baseline measurement of firefly lucif- 
erase activity. To estimate relative transcription factor affinity with each promoter 
fragment, three biological replicates of transcription factor expressing vectors were 
compared to the average results for the GUS expression vector. First, dividing firefly 
luciferase activity by Renilla luciferase activity normalized the transformation effi- 
ciency of each infiltrated leaf sample. Relative binding of the transcription factors to 
the promoter bait sequences was determined relative to the GUS control using a 
Student’s t-test in R v2.11.0. 
Electrophoretic mobility shift assays. To express recombinant NST2 or SND1 
protein, coding sequence was cloned and fused to glutathione S-transferase tag 
in the pPDONR211 vector and then transferred into pDEST15 (Invitrogen). E. coli 
strain BL21-AI (Invitrogen) transformed with pDEST15-GST:NST2 were grown 
in liquid media to a Doo nm of 0.4, treated with 0.2% L-arabinose to induce expres- 
sion overnight and harvested by centrifugation the following day. Cells were treated 
with 1 mg ml‘ lysozyme on ice for 30 min in minimal volume of 1X PBS buffer 
and lysed by sonication. Cell lysates were clarified by centrifugation and incubated 
with 100 il of glutathione Sepharose beads (GE Healthcare) for 30 min at 4 °C with 
rotation. The beads were transferred to a column, washed with 10 volumes of 1X 
PBS. Protein was eluted in 100 mM Tris-HCl pH 8.0, 100 mM NaCland 3 mg ml~ I 
glutathione buffer and purified protein was resuspended in 50% glycerol and stored 
at —80°C. 

Three overlapping probes were generated for CESA7, CESA8 and KOR promo- 
ters using the same oligonucleotides described in Supplementary Table 1, whereas 
three probes were generated for CESA4 using the following primers: CESA4pr- 
1fwd, CACCGGGCCTTTGTGAAATTGATTTTGGGC; CESA4pr-lrev, TGTA 
TTTCTACTTTAGTCTTAC; CESA4pr-2fwd, CCAGATTTGGTAAAGTTTAT 
AAG; CESA4pr-2rev, GTGTCATAAGAAAGCTTCAAG; CESA4pr-3fwd, TCTT 
ATGACACAAACCTTAGAC; CESA4pr-3rev, ACACTGAGCTCTCGGAAGC 
AGAGCAG. Reactions were carried out in binding buffer (10 mM Tris, pH 7.5, 
50mM KCl, 1 mM DTT, 2.5% glycerol, 5 mM MgCl, 0.1% IGEPAL CA-630, and 
0.05 1g pl? calf thymus DNA). Following the addition of 150 ng of protein from the 
GST purification eluate, reactions were incubated at room temperature for 30 min. 
Protein-DNA complexes were separated from the free DNA on 1% agarose/1 TAE 
gels at 4 °C. The agarose gels were stained with ethidium bromide and bands visu- 
alized under ultraviolet light. For the titration of promoter DNA with NST2 pro- 
tein, CESA4 promoter fragment-2 DNA and KOR promoter fragment-1 DNA in 
30 ng were titrated with increasing amounts of NST2 protein: 25, 50, 150, 300, and 
600 ng. Binding reaction and the separation of protein-DNA complexes were car- 
ried out as described above. 

Chromatin immunoprecipitation of NST2. Chromatin immunoprecipitation 
was conducted as described in ref. 46 with the following modifications. Roughly 
5 g (fresh weight) whole stems from six-week-old Arabidopsis were harvested and 
crosslinked for 15 min under vacuum in crosslinking buffer (10 mM Tris, pH 8.0, 
1mM EDTA, 250 mM sucrose, 1mM PMSF and 1% formaldehyde). Technical 
replicates containing approximately 1.5 mg DNA were resuspended in 800 jl SII 
buffer, incubated with 2 pg anti-GFP antibody (ab290, Abcam) bound to Protein 
G Dynabeads (Invitrogen) for 1.5h at 4°C and then washed five times with SII 
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buffer. Chromatin was eluted from the beads twice at 65 °C with Stop buffer (20 mM 
Tris-HCl, pH 8.0, 100 mM NaCl, 20 mM EDTA and 1% SDS). RNase- and DNase- 
free glycogen (2 1g) (Boehringer Mannheim) was added to the input and eluted 
chromatin before they were incubated with DNase- and RNase-free proteinase K 
(Invitrogen) at 65 °C overnight and then treated with 2 jig RNase A (Qiagen) for 
1h at 37 °C. DNA was purified by using Qiagen PCR Purification kit and resus- 
pended in 100 pl H20. Quantitative PCR reactions of the technical replicates were 
performed using Quantifast SYBR Green PCR Kit (Qiagen), with the following 
PCR conditions: 2 min at 95 °C, followed by 40 cycles of 15s at 95 °C, 15s at 55°C 
and 20s at 68 °C. Primers used in this study are listed in Supplementary Table 4. 
Results were normalized to the input DNA, using the following equation: 100 
2ct input — 3.32 — Ct ChIP) 

Quantitative RT-PCR. Primers for RT-PCR were designed to amplify a 100 bp 
region (or a 400 bp region for REV, PHB, and PHV transcripts due to sequence 
similarity) on the 3’ end of each transcript*’. Primer sets used for RT-PCR are 
listed in Supplementary Table 1. Each plate was considered a biological replicate 
and Columbian and reference genotypes were plated on the same plate. Five days 
after imbibition, total RNA was extracted from seedling roots using an RNeasy Kit 
(QIAGEN). cDNA was synthesized by treatment with reverse transcriptase and 
oligo(dT) primer (SuperScript III First-Strand Synthesis System; Invitrogen). qRT- 
PCR was performed in an iCycler iQ Real-Time PCR Detection System (Bio-Rad) 
using the Bio-rad iQ SYBR green Supermix. Gene expression was measured be- 
tween wild-type and mutant pairs across at least two biological replicates with three 
technical replicates using the A — AC; method”®. 

VND7 induction experiments. VND7-VP16-GR’ plants were grown vertically on 
sterile mesh placed on top of MS media with sucrose. Five days after imbibition, 
seedlings were transferred, with the mesh, to MS media containing 10 1M dexa- 
methasone and roots were collected for qRT-PCR (RNeasy Kit; Qiagen) after 0, 1, 
2, 3, or 4 h on dexamethasone ( = 3). As a positive control, upregulation of 
MYB46 expression was confirmed using qRT-PCR. 

Nitrogen influx, salt stress, iron deprivation, sulphur stress, pH stress analysis. 
The data sets used contained mean expression values for each gene in both control 
and treatment, and a q value for each gene indicating the significance of the hy- 
pothesis that the expression values of control and treatment are drawn from dis- 
tributions with the same means. These data sets were filtered to extract only those 
genes whose q value was = 0.01 and whose fold change between mean expression 
values was = 1.5 in either direction. Fisher’s exact test was used to test whether the 
number of such genes is overrepresented in the xylem cell specification and differ- 
entiation gene regulatory network. 

Gene regulatory network inference. Expression data*® were used, after normal- 
ization with the mmgMOS method from the PUMA R package”. The supervised 
regulatory interactions network was constructed using SIRENE*. The direction- 
ality of the interactions is defined by the protein-DNA interactions from Y1H data. 
The interaction sign (activation or repression) is derived by Pearson’s correlation 
coefficient for each protein-DNA interaction. The analysis performed was cate- 
gorized as (1) supervised tier Ia, network inferred with SIRENE with the provided 
Y1H gene regulatory connections and the corresponding gene expression profiles 
(16 genes, 4 transcription factors); (2) supervised tier Ib, an additional three verified 
connections from the supervised tier Ia and unsupervised tier I were considered in 
the inference. The unsupervised regulatory interaction network was constructed 
using the consensus from four different gene regulatory network inference meth- 
ods, GENIE3”’, Inferelator®, TIGRESS*! and ANOVerence™. The data used were 
the same as the supervised TIERIa network. The default parameters were used in 
all methods and a rank-based method was used to build the consensus network as 
in ref. 53. 
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Extended Data Figure 1 | Number of novel and previously described 
protein-DNA interactions and transcription factors involved in secondary 
cell wall biosynthesis and xylem development. a, b, Venn diagrams of 
overlap between previously reported”’ interactions (a) or transcription 
factors (b) and those of the xylem-specific gene regulatory network. *Includes 
genes that were not included in the yeast one hybrid screen. 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


= 

Oo 

Oo 
1 


om 


Bioluminescence (/pxel.10min) 


130.5 1 


mT 


1:10 


eae ee 


1000 


100 


10 


Relative Fold E2Fc Expression 


0 1 2 


3 


4 5 6 7 


Relative Fold VND7 Expression 


Relative Fluorescence 


VND7p:GUS 


-5000 


Extended Data Figure 2 | Activation or repression of VND7 by E2Fc is 
dynamic and dose-dependent. a, Intensity of LUC bioluminescence 
quantified using Andor Solis image analysis software. Data are means + s.d. 
(n = 20). Asterisks denote significance at P< 0.05 determined by Student’s 
t-test. b, Quantitative PCR with reverse transcription of E2Fc and VND7 
transcripts in AN-E2Fc (E2Fc overexpressor line lacking the N-terminal 
domain) expressing plants versus Col-0 control. Red dashed line marks the 


VND7p:GUS + 
E2Fc 


VND79:GUS + 
E2Fc+RBR 
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point at which VND7 is unchanged compared to control. Each data point 

is an individual biological replicate with 3 technical replicates. c, 3-week-old 
tobacco leaves were infiltrated with the p19 silencing inhibitor and either 

the reporter VND7p::GUS or VND7p::GUS and either 35S::E2Fc::MYC or 
35S::RBR::GFP, or both. Extracted protein was then used in a quantitative MUG 
fluorescent assay, where relative fluorescence was measured 60 min after 
incubation with substrate. Data are means + s.d., n = 3. 
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Extended Data Figure 3 | Binding of NST2 and SND1 to fragments of 
CESA7, CESA8, and KOR promoters. a-f, Electrophoretic mobility shift 
assays showing NST2 (a-d) and SND1 (e-f) protein specifically binds the 
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promoters of cellulose-associated genes. Probe was incubated in the absence or 
presence of GST or GST:SND1 protein extracts. The arrowheads indicate the 
specific protein-DNA complexes, while arrows indicate free probe. 
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Extended Data Figure 4 | Sub-networks of network genes differentially NaCl (b) stress microarray data set. Nodes are coloured according to in-degree 
expressed in response to iron deprivation of high salinity. a,b, Sub-network —_as shown on scale bars below sub-networks. Transcription factors with the 
of genes with q values of = 0.01 and whose fold change between mean highest in-degree are labelled and indicated with a black circle. 


expression values was = 1.5 in either direction in iron deprivation (a) or high 
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Extended Data Figure 5 | The reconstructed gene regulatory consensus correlation coefficient (PCC); edge width is proportional to PCC; edge value 
network based on analysis of the iron-deprivation expression data set by corresponds to the total edge score; a greater value corresponds to a more 
different network inference methods. a, Unsupervised; b, supervised in significant score. Yellow and red nodes correspond to transcription factor and 
the first pass; c, supervised after the validated two connections have been added __ target gene nodes, respectively; black and blue edges denote Y1H-derived 

in the training set. Edge transparency denotes P = 0.06 for the Pearson and inferred interactions, respectively. 
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Extended Data Figure 6 | Iron deprivation and NaCl stress influences lignin 
and phenylpropanoid biosynthesis associated gene expression. a, No change 
was observed in the expression of 4CL1::GFP in 4 days after imbibition (DAI) 
roots transferred to a control media (left, n = 4) or media with 140 mM 

NaCl for 48 h (right, n = 4). b, Increased fuchsin staining of xylem cells as well 
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as of cell walls of non-vascular cells in 4 DAI roots transferred to a control 
media (left) or media with an iron chelator for 72h (right). c, No change was 
observed in the expression of VND7::YFP in 4 DAI roots transferred to a 
control media (left, n = 4) or media with an iron chelator for 72 h (right, n = 5). 
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Extended Data Figure 7 | The reconstructed gene regulatory consensus coefficient (PCC); edge width is proportional to PCC; edge value corresponds 
network based on analysis of the salt-stress expression data set by different to the total edge score; a greater value corresponds to a more significant 
network inference methods. a, Unsupervised; b, supervised in the first pass; score. Yellow and red nodes correspond to transcription factor and target gene 
c, supervised after the validated two connections have been added in the nodes, respectively; black and blue edges denote Y1H-derived and inferred 
training set. Edge transparency denotes P = 0.06 for the Pearson correlation interactions, respectively. 
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Extended Data Figure 8 | Schematic diagram of dual-luciferase reporter promoter, or a promoter fragment. b, The dual reporter vector, p»hLAH-LARm, 
vector development. a, Three distinct donor vectors harbouring either the is then recombined with the three donor vectors to generate the single reporter 
transcription factor, VP64 activation domain fused to the 35S minimal vector (c). 
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Comprehensive genomic 
characterization of head and neck 
Squamous cell carcinomas 


The Cancer Genome Atlas Network* 


The Cancer Genome Atlas profiled 279 head and neck squamous cell carcinomas (HNSCCs) to provide a comprehensive 
landscape of somatic genomic alterations. Here we show that human-papillomavirus-associated tumours are 
dominated by helical domain mutations of the oncogene PIK3CA, novel alterations involving loss of TRAF3, and 
amplification of the cell cycle gene E2F1. Smoking-related HNSCCs demonstrate near universal loss-of-function TP53 
mutations and CDKN2A inactivation with frequent copy number alterations including amplification of 3q26/28 and 
11q13/22. A subgroup of oral cavity tumours with favourable clinical outcomes displayed infrequent copy number alter- 
ations in conjunction with activating mutations of HRAS or PIK3CA, coupled with inactivating mutations of CASP8, 
NOTCHI and TP53. Other distinct subgroups contained loss-of-function alterations of the chromatin modifier NSD1, 
WNT pathway genes AJUBA and FATI, and activation of oxidative stress factor NFE2L2, mainly in laryngeal tumours. 
Therapeutic candidate alterations were identified in most HNSCCs. 


HNSCCs affect ~600,000 patients per year worldwide’. They are char- 
acterized by phenotypic, aetiological, biological and clinical heterogen- 
eity. Smoking is implicated in the rise of HNSCC in developing countries, 
and the role of human papillomavirus (HPV) is emerging as an import- 
ant factor in the rise of oropharyngeal tumours affecting non-smokers 
in developed countries’. Despite surgery, radiation and chemotherapy, 
approximately half of all patients will die of the disease. Risk stratifica- 
tion for HNSCC is by anatomic site, stage and histological characteristics 
of the tumour. Except for HPV status, numerous molecular and clin- 
ical risk factors that have been investigated have limited clinical utility. 

Published genome-wide profiling studies of HNSCC** are limited 
to single platforms. To generate an integrated genomic annotation of 
molecular alterations in HNSCC, The Cancer Genome Atlas (TCGA) 
has undertaken a comprehensive multi-platform characterization of 
500 tumours with the a priori hypothesis of detecting somatic variants 
present in at least 5% of samples. Here, we report the results for ana- 
lyses from the first 279 patients with complete data. 


Samples and clinical data 

The cohort consists primarily of tumours from the oral cavity (n = 172 
out of 279, 62%), oropharynx (n = 33 out of 279, 12%), and laryngeal 
sites (n = 72 out of 279, 26%) (Supplementary Information section 1, 
Supplementary Table 1.1 and Supplementary Data 1.1). Most patients 
were male (” = 203 out of 279, 73%) and heavy smokers (mean pack 
years = 51). Samples were classified as HPV-positive using an empiric 
definition of > 1,000 mapped RNA sequencing (RNA-Seq) reads, prim- 
arily aligning to viral genes E6 and E7 (Supplementary Information 
section 1.2 and Supplementary Fig. 1.1). The HPV status by mapping 
of RNA-Seq reads was concordant with the genomic, sequencing and 
molecular data, and indicated that 36 tumours were HPV(+) and 243 
were HPV(—) (Supplementary Information section 1.2, Supplemen- 
tary Fig. 1.1 and Supplementary Data 1.2). Of 33 oropharyngeal tumours, 
64% were positive for HPV, compared to 6% of 246 non-oropharyngeal 
tumours. Molecular HPV signatures were identified using microRNA 


(miRNA), DNA methylation, gene expression and somatic nucleotide 
substitutions (Supplementary Information section 1.2 and Supplemen- 
tary Figs 1.1-1.3). HPV(+) tumours exhibited infrequent mutations in 
TP53 or genetic alterations in CDKN2A. We evaluated outcome by site, 
stage, HPV status, molecular subtypes and putative biomarkers (Sup- 
plementary Information section 1.3 and Supplementary Figs 1.4 and 
1.5). HPV(+) and interestingly patients with HPV(—), TP53 wild-type 
tumours demonstrated favourable outcomes compared to TP53 mutants 
and 11q13/CCND1-amplified tumours. 


DNA and RNA structural alterations 

Most tumours demonstrated copy number alterations (CNAs) includ- 
ing losses of 3p and 8p, and gains of 3q, 5p and 8q chromosomal re- 
gions (Fig. 1a, Supplementary Fig. 2.1 and Supplementary Information 
section 2) resembling lung squamous cell carcinomas (LUSCs)° (Fig. 1a 
and Supplementary Figs 2.1 and 2.2). HNSCC genomes showed high 
instability with a mean of 141 CNAs (amplifications or deletions) from 
microarray data and 62 structural aberrations (chromosomal fusions) 
per tumour by ‘high coverage’ whole-genome sequencing (n = 29) (Sup- 
plementary Information section 2.2). We observed 39 regions of recur- 
rent copy number loss and 23 regions of recurrent copy number gain (q 
< 0.1, Supplementary Data 2.1 and 2.2). Both HPV(+) and (—) tumours 
contained recurrent focal amplifications for 3q26/28, a region involv- 
ing squamous lineage transcription factors TP63 and SOX2 and the 
oncogene PIK3CA (Fig. 1b and Supplementary Fig. 2.3). 

HPV(+) tumours were distinguished by novel recurrent deletions 
(n = 5 out of 36, 14%) and truncating mutations (n = 3 out of 36, 8%) 
of TNF receptor-associated factor 3 (TRAF3) (Supplementary Figs 2.3 
and 2.4, and Supplementary Data 2.1). TRAF3 is implicated in innate 
and acquired anti-viral responses® including Epstein-Barr, HPV and 
human immunodeficiency virus (HIV) °, while loss promotes aberrant 
NF-«B signalling’®. Although TRAF3 inactivation has been reported in 
haematological malignancies and nasopharyngeal carcinoma'"”, to our 
knowledge this is the first evidence linking TRAF3 to HPV-associated 


*Lists of participants and their affiliations appear at the end of the paper. 
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Figure 1 | DNA copy number alterations. a, Copy number alterations by 
anatomic site and HPV status for squamous cancers. Lung squamous cell 
carcinoma (LUSC, n = 358) and cervical squamous cell carcinoma (CESC, 
n= 114). b, Unsupervised analysis of copy number alteration of HNSCC 

(n = 279) with associated characteristics. The rectangle indicates chromosome 
7 amplifications in the purple cluster. NA, not available. 


carcinomas. HPV(+) tumours were also notable for focal amplifica- 
tion of E2F1 and an intact 9p21.3 region containing the CDKN2A gene 
commonly deleted in HPV(—) tumours. 

HPV(-—) tumours featured novel co-amplifications of 11q13 (CCND1, 
FADD and CTTN) and 11q22 (BIRC2 and YAP1), which also contain 
genes implicated in cell death/NF-«B and Hippo pathways. HPV(—) 
tumours featured novel focal deletions in the nuclear set domain gene 
(NSD1) and tumour suppressor genes (for example, FAT1, NOTCH1, 
SMAD4 and CDKN2A; Supplementary Fig. 2.3). Recurrent focal am- 
plifications in receptor tyrosine kinases (for example, EGFR, ERBB2 
and FGFR1) also predominated in HPV(—) tumours. Notably, unsu- 
pervised clustering analysis of CNAs identified a mutually exclusive 
subset of predominantly oral cavity tumours with reduced CNAs, a pat- 
tern recently described in cancer as ‘M’ class (tumours driven by muta- 
tion rather than CNA)” (Fig. 1b). This subset in particular contained a 
new three-gene pattern of activating mutations in HRAS, frequently 
with inactivating CASP8 mutations, and wild-type TP53. We confirmed 
a previously reported favourable clinical outcome in tumours with few 
CNAs". The three-gene constellation of wild-type TP53 with mutant 
HRAS and CASP8 suggested an alternative tumorigenesis pathway 
involving RAS and/or alterations in cell death/NF-«B**. Unsupervised 
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analysis also suggested that clustering was a function of chromosome 
7 amplification (including the EGFR locus) in a manner that largely 
excluded HPV(+) tumours. 

To detect additional structural alterations, we interrogated whole- 
genome and RNA-Seq data (Supplementary Information section 3, 
Supplementary Data 3.1 and Supplementary Fig. 3.1). Known fusion 
oncogenes reported in solid tumours including those involving the ALK, 
ROS or RET genes were not observed in HNSCC. Previously reported 
FGFR3-TACC3 fusions were present in two HPV(+) tumours (Sup- 
plementary Fig. 3.2). Only 1 out of 279 patients showed evidence of the 
type Il isoform of EGFR (vIII), previously described in HNSCC"® (Sup- 
plementary Fig. 3.3). Although our investigation did not identify addi- 
tional novel oncogenic fusions, several tumours demonstrated exon 1 
of EGFR or FGFR3 fused to non-recurrent partners, suggesting poten- 
tial promoter swaps for the partner genes (Supplementary Data 3.1). A 
low prevalence of an alternative MET transcript with skipped exon 14 
was identified in two HPV(—) tumours (Supplementary Fig. 3.4); this 
finding was reported to be an activating event in non-small cell lung 
cancer’. Structural alterations (homozygous deletions, intra- and inter- 
chromosomal fusions) were more commonly associated with loss of 
function in tumour suppressor genes, most prominently CDKN2A (Sup- 
plementary Figs 3.5 and 3.6), followed by TP53, RB1, NOTCH1 and 
FAT1 (Supplementary Figs 3.7-3.9), than with protein-coding fusion 
events. RNA-Seq data (Supplementary Data 3.3) demonstrated evid- 
ence of alternative splicing in genes not previously described in HNSCC 
including kallikrein 12 (KLK12) (Supplementary Fig. 3.11), as well as 
genes such as TP63 with known importance in HNSCC (Supplemen- 
tary Fig. 3.12). 

By DNA analysis, most HPV(+) tumours demonstrated clear evid- 
ence of host genome integration, usually in a single genomic location 
per sample and almost always in association with amplifications of the 
host genome (Supplementary Fig. 3.10 and Supplementary Data 3.2). 
Interrogation of RNA transcripts confirmed transcription across the 
viral-human integration locus. However, none of the genes involved 
were recurrent, suggesting no single driver mechanism related to HPV 
integration. Similarly, none of the integration sites involved the MYC 
gene as reported in HPV(+) cell lines’®. 


Somatic mutations 


Whole-exome sequencing identified somatically mutated genes, many 
located in regions of CNAs and annotated in the COSMIC database”” 
(Fig. 2). The mean sequencing coverage across targeted bases was 95X, 
with 82% of target bases above 30X coverage. In 279 samples, 12,159 
synonymous somatic variants, 37,061 non-synonymous somatic var- 
iants, and 2,579 germline single base substitutions from the single 
nucleotide polymorphism database (dbSNP)”° were detected (Sup- 
plementary Information section 4). Targeted re-sequencing of 394 
unique regions (Supplementary Fig. 4.1) validated 99% of mutations. 
Interrogation of RNA for expression of the mutated alleles confirmed 
the variant in 86% of cases (Supplementary Information section 3.2 
and Supplementary Fig. 3.1). In contrast to previous reports, the muta- 
tion rates did not differ by HPV status, although transversions at CpG 
sites were more frequent in HPV(—) tumours and a predominance of 
TpC mutations were noted in HPV(+) cases’ (Supplementary Fig. 1.1). 
Mutations were statistically enriched in 11 genes (Fig. 2). Among inac- 
tivating mutations (premature termination of the protein by nonsense, 
frameshift or splice-site mutations), four genes segregated exclusively 
or predominantly in HPV(—) tumours. Two were associated with cell 
cycle and survival (CDKN2A (P < 0.01) and TP53 (P < 0.01)) and two 
were linked to Wnt/B-catenin signalling (FAT1 (P < 0.01) and AJUBA 
(P = 0.14))"””. We observed TP53 mutation among HPV(—) samples 
at higher rates (86%) than have been previously reported”, while only 
1 out of 36 HPV(+) cases had a non-synonymous TP53 mutation. 
Previously unreported somatic mutations and deletions of AJUBA were 
primarily 5’ inactivating events and clustered missense mutations in 
the functional LIM domain (Supplementary Fig. 4.2). AJUBA is a 
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Smoking status 


Figure 2 | Significantly mutated genes in 
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centrosomal protein that regulates cell division, vertebrate ciliogenesis 
and left-right axis determination”. Additionally, AJUBA is subject to 
EGFR-RAS-MAPK-dependent phosphorylation and implicated in Hippo 
growth and regeneration pathways conserved from Drosophila to mam- 
mals**”*, in ataxia-telangiectasia mutated (ATM) and ATM and Rad-3- 
related (ATR)-mediated DNA damage response”’, and tumour invasion 
and migration”. 

A frequently mutated novel gene, the nuclear receptor binding SET 
domain protein 1 (NSD1), was identified in 33 HNSCCs. Alterations 
included inactivating mutations (n = 29) and focal homozygous dele- 
tions (n = 4). NSD1 is a histone 3 Lys 36 (H3K36) methyltransferase, 
similar to SETD2, which is frequently mutated in the clear cell variant 
of renal cell carcinoma, and associated with DNA hypomethylation”. 
Germline carriers of inactivating mutations in NSD1 are associated with 
craniofacial abnormalities (Sotos syndrome), and malignancies includ- 
ing squamous carcinoma, implicating NSD1 as a tumour suppressor 
gene”. Interestingly, NSD1 functions as an oncogene when fused to 
nucleoporin-98 (NUP98) t(5;11)(q35;p15.5) in haematological cancers 
with increased H3K36 trimethylation levels at HOXA genes and accom- 
panying transcriptional activation”. Translocations involving other ded- 
icated H3K36 methyltransferase genes including WHSCI (also known 
as MMSET and NSD2) are reported in 20% of multiple myelomas. By 
contrast, NSD1 loss has been associated with sporadic non-melanoma 
skin cancers*". Significant inactivating mutations were found in genes 
linked to squamous differentiation including in NOTCH1 (19%), and 
other non-significant family members (NOTCH2 9%, and NOTCH3 
5%, q > 0.1, non-significant), and the TP63 target gene ZNF750 (4%, 
q> 0.1, non-significant), which falls in a significantly deleted peak at 
17q25.3. The analysis identified additional mutations including TRAF3, 
RB1 and NFE2L2, among others with q values < 1 (non-significant). 
The frequently mutated apoptosis gene CASP8 displayed clustered 
missense and other inactivating mutations in the first death effector, 


578 | NATURE | VOL 517 | 29 JANUARY 2015 


intron and caspase peptidase domains. Statistically significant muta- 
tions in KMT2D (also known as MLL2) and HLA-A could contribute 
to defective immunosurveillance. Of known oncogenes, only PIK3CA 
achieved statistical significance (q < 0.01). Approximately one-quarter 
of the mutated PIK3CA cases displayed concurrent amplification, with 
an additional 20% of tumours containing focal amplification without 
evidence of mutation. Seventy-three per cent of PIK3CA mutations lo- 
calized to Glu542Lys, Glu545Lys and His1047Arg/Leu hotspots that 
promote activation, with the remaining mutations of uncertain func- 
tion. Recurrent activating mutations of HRAS in the GTPase domain in 
residues 11-13 approached statistical significance (q = 0.2). 

We extended our unsupervised genome-wide analysis of significantly 
mutated genes as well as genes reported in COSMIC to a subgroup ana- 
lysis by anatomic sites, tumour versus normal status, HPV status and 
four previously validated gene expression subtypes*”** (Supplementary 
Information section 5, Supplementary Figs 5.1-5.4 and Supplementary 
Data 4.1 and 5.1-5.4). Additional mutations included TRAF3, RBI and 
NFE2L2, among others with q values < 1, and we observed statistical 
evidence for mutations of HRAS (q = 0 in COSMIC subset) and other 
genes. Sporadic inactivating mutations and deletions of TGFBR2 were 
identified primarily in oral cavity tumours, consistent with its role in 
promoting squamous tumorigenesis in mouse models™. Investigating 
COSMIC database mutations focused attention on the significant dele- 
tion peak at 4q31.3 containing the gene FBXW7, a ubiquitin ligase tar- 
geting cyclin E and NOTCH genes, in which we identified mutations 
that included recurrent Arg505Gly/Leu substitutions (n = 14). Genes 
with at least one identical mutation previously reported in COSMIC 
include SCN9A, CHEK2, PTCH1 and PIK3R1. We further focused on 
somatic alterations and protein expression that represent plausible 
therapeutic targets (Fig. 3, Supplementary Information sections 6 
and 7, Supplementary Figs 6.1 and 6.2 and Supplementary Data 6.1 
and 6.2). 
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Figure 3 | Candidate therapeutic targets and driver oncogenic events. Alteration events for key genes are displayed by sample (n = 279). TSG, tumour 


suppressor gene. 


Integrated genome analysis and pathways 


Correlative genetic alteration analysis identified numerous pairwise 
significant findings (Supplementary Information section 7 and Sup- 
plementary Fig. 7.1). In particular, co-amplification of 11q13 containing 
CCND1, FADD and CTTN and a narrow segment of 11q22 containing 
the genes with equal evidence for YAP1 and BIRC2 was further char- 
acterized (Supplementary Fig. 7.2). Chromosome 11q22 was focally but 
rarely amplified in the absence of co-amplification of 11q13. This novel 
finding suggests that the selection pressure for this co-amplification 
stems from the interaction of BIRC2 with FADD and the caspase cas- 
cade that inhibits cell death. Notably, the vast majority of tumours with 
the 11q13 amplification had large deletions in the telomeric region of 
11q22, including other genes known to be important in cell death in 
cancer such as ATM and CASP1, 4, 5 and 12. Amplification of 11q13 
was anti-correlated with CASP8 mutations, suggesting an alternative 
function of CASP8 and FADD in cell death/NF-«B activation”. 
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We investigated whether clinical factors, single gene alterations and 
statistically significant pairwise gene correlations (Supplementary 
Fig. 7.1 and Supplementary Data 7.1) might segregate previously defined 
molecular subtypes and/or anatomic sub-sites (Supplementary Data 
5.1-5.4). We confirmed reported gene expression subtypes (atypical 
(24%), mesenchymal (27%), basal (31%) and classical (18%)), and 
assessed the subtypes for enrichment of somatic alterations**** (Sup- 
plementary Data 7.1). Notably, TP53 mutation, CDKNZ2A loss of func- 
tion, chromosome 3q amplification, alteration of oxidative stress genes 
(KEAP1, NFE2L2 or CUL3), heavy smoking history (Supplementary 
Table 1.1) and larynx sub-site co-occurred in most classical subtype tu- 
mours (Fig. 4a and Supplementary Information section 7.2), similar to 
LUSC? (Supplementary Figs 5.1 and 5.2). Collectively, these findings 
suggest that the NFE2L2 oxidative stress pathway is a tobacco-related 
signature across anatomic tumour sites. By contrast, the basal subtype 
demonstrated inactivation of NOTCH1 with intact oxidative stress 


Figure 4 | Integrated analysis of genomic 
alterations. a, b, Samples (1 = 279) are displayed 
in columns and grouped by gene expression (a) 
or methylation (b) subtype (sub.). Unadjusted 
two-sided Fisher’s exact test P values assess the 
association of each genomic alteration. 
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Figure 5 | Deregulation of 
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signalling and fewer alterations of chromosome 3q. Analysis of the 3q 
locus highlighted a marked relative decrease of SOX2 expression in 
basal tumours relative to all other HNSCC and tumour adjacent nor- 
mal samples (Supplementary Fig. 5.4), supporting the interaction of 
transcription factors SOX2, TP63, NFE2L2 and NOTCH] as driving 
differences between expression subtypes. Additionally, the basal sub- 
type included most tumours with the HRAS-CASP8 co-mutation and 
most co-amplified 11q13/q22 tumours. These findings along with HRAS 
mutations implicate disrupted cell death as a major alteration in this 
subtype*® (Supplementary Fig. 7.2). The atypical subtype was charac- 
terized by a lack of chromosome 7 amplifications (Supplementary Fig. 5.3), 
enrichment of HPV(+) tumours with activating mutations in exon 9 
that contains the PIK3CA helical domain. By contrast, the mesench- 
ymal subtype showed high levels of alteration in innate immunity genes, 
in particular high expression of natural killer cell marker CD56 and 
a low frequency of HLA class I mutations (Supplementary Fig. 7.3). 
Among the significantly mutated genes, TP53 (P < 0.001), CASP8 (P = 
0.01), NSD1 (P = 0.01) and CDKN2A (0.06) were the most differentially 
mutated across anatomic sites (Supplementary Data 4.1). Most CASP8 
mutations (22 out of 24, 92%) were in oral cavity tumours, whereas 
TP53, NSD1 and CDKN2A demonstrated decreased mutation rates in 
oropharyngeal tumours relative to other sites. 

Unsupervised analysis of gene expression by HPV status and of 
reverse-phase protein arrays (Supplementary Information section 6), 
DNA methylation (Supplementary Information section 8), and miRNA 
platforms (Supplementary Information section 9, Supplementary Table 
9.1, Supplementary Figs 7.4-7.9 and Supplementary Data 7.1) showed 
high correlation across platforms (P < 0.01; Fig. 4, Supplementary In- 
formation section 7.9 and Supplementary Data 7.2) and coordinated 
alterations of genes including the epithelial-mesenchymal transition 
signature*” (Supplementary Figs 7.4, 7.7 and 7.8). However, within the 
broader cross-platform agreement, individual unsupervised clustering 
of miRNA, reverse-phase protein arrays and DNA methylation data 
provided insight into the association of molecular subtypes with single 
gene alterations. The most notable example was the detection of hypo- 
methylation and loss-of-function mutations of NSD1, and wild-type 
NOTCH in atypical and classical gene expression subtypes (Fig. 4b 
and Supplementary Table 7.1). 

Supervised analyses detected genomic features (miRNA, gene ex- 
pression and DNA methylation) associated with anatomic site (Sup- 
plementary Figs 5.5-5.8 and Supplementary Data 5.1-5.4). A supervised 
integrated analysis identified target genes that are inversely regulated by 
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miRNAs in HNSCC (Supplementary Information section 7.4). Among 
these miRNA-messenger RNA networks, let-7c-5p and miR-100-5p 
exhibited a correlation between low copy number and expression. 
Let-7c-5p and miR-100-5p were decreased in tumours compared to nor- 
mal (Supplementary Fig. 7.10). For these miRNAs, deletion was highly 
associated with increased expression of target genes, including the cell 
cycle regulator CDK6, transcription factor E2F1 (ref. 38), mitosis regu- 
lator PLKI (ref. 39), and transcription factor HMGA2 (ref. 40; Sup- 
plementary Figs 7.10, Supplementary Tables 7.2 and 7.3). 

Integrative bioinformatics analysis identified a limited number of 
pathways targeted by frequent genome alterations (Fig. 5, Supplemen- 
tary Information section 7, Supplementary Figs 7.11-7.15 and Sup- 
plementary Data 7.3). Among receptor tyrosine kinases, EGFR/ERBB2 
or FGFR1/3 alterations are the most frequent. Among downstream tar- 
gets of the receptor tyrosine kinase (RTK)/RAS/phosphatidylinositol- 
3-OH kinase (PI(3)K) pathway, PIK3CA dominates with occasional 
HRAS and PTENalterations. Further downstream, nearly every tumour 
has alteration of genes governing the cell cycle. The tumour suppressors 
TP53 and CDKN2A, oncogenes CCND1 and MYC, and the newly iden- 
tified miRNA let-7c, are most often altered in HPV(—) tumours, whereas 
viral genes E6, E7 and E2F1 predominate in HPV(+) cases. In addi- 
tion, we report frequent alterations in genes involved in cell death, NF- 
«B-mediated survival, or immunity pathways’***. Co-amplification of 
FADD + BIRC2, or CASP8 + HRAS mutations define exclusive HPV(—) 
subsets, whereas TRA F3 loss characterizes an HPV(+) subset. These alter- 
ations along with PIK3CA and TP63 converge on NF-«B transcription 
factors that promote cell survival, migration, inflammation and angio- 
genesis*’”. Furthermore, TRAF3 and/or HLA loss are implicated in de- 
regulation of innate antiviral and adaptive anti-tumour immunity*. 
Further alterations of NOTCH, TP63 and other genes in HPV(—) tu- 
mours (FAT1 and AJUBA) recently linked functionally to B-catenin 
(CTNNB1) are also detected*!”***. Finally, we highlight a previously 
underappreciated role for a key transcription factor regulator of oxid- 
ative stress, NFE2L2, and its protein complex partners CUL3 and KEAP1 
in HPV(—) HNSCCs. 


Conclusion 

The TCGA study represents the most comprehensive integrative ge- 
nomic analysis of HNSCC. Loss of TRAF3, activating mutations of 
PIK3CA, and amplification of E2F1 in HPV(+) oropharyngeal can- 
cers point to aberrant activation of NF-«B, other oncogenic pathways, 
and cell cycle, as critical in the pathogenesis and development of new 
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targeted therapies for these tumours. In HPV(—) HNSCCs, mutually 
exclusive subsets containing amplicons on 11q with CCND1, FADD, 
BIRC2 and YAPI, or concurrent mutations of CASP8 with HRAS, also 
target cell cycle, death, NF-«B and other oncogenic pathways. Recent 
studies predict that the inactivation of AJUBA, as well as FAT1 and 
NOTCHI, may converge to uncheck Wnt/-catenin signalling, impli- 
cated in deregulation of cell polarity and differentiation. The 3q amplicon 
found in both HPV(+) and (—) HNSCCs includes transcription fac- 
tors TP63, SOX2 and signal molecule PIK3CA, which are also impli- 
cated in homeostasis of epithelial stem cells and differentiation. Among 
these, the biological function and agents targeting BIRCs, PI(3)K, Wnt/ 
B-catenin and NOTCH are under investigation. Collectively, these find- 
ings provide new insights into HNSCC and suggest that shared and 
unique alterations might be leveraged to accelerate progress in preven- 
tion and therapy across tumour types. 
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Genome-scale transcriptional activation 
by an engineered CRISPR-Cas9 complex 


Silvana Konermann’****, Mark D. Brigham'**-**, Alexandro E. Trevino’**”*, Julia Joung’*, Omar O. Abudayyeh’?**, 
Clea Barcenah??"*, Patrick D. Hsub?*4, Naomi Habib’, Jonathan S. Gootenberg)**+°, Hiroshi Nishimasu®’, Osamu Nureki® 


& Feng Zhang?" 


Systematic interrogation of gene function requires the ability to perturb gene expression in a robust and generalizable 
manner. Here we describe structure- guided engineering of a CRISPR-Cas9 complex to mediate efficient transcriptional 
activation at endogenous genomic loci. We used these engineered Cas9 activation complexes to investigate single- guide 
RNA (sgRNA) targeting rules for effective transcriptional activation, to demonstrate multiplexed activation of ten genes 
simultaneously, and to upregulate long intergenic non-coding RNA (lincRNA) transcripts. We also synthesized a library 
consisting of 70,290 guides targeting all human RefSeq coding isoforms to screen for genes that, upon activation, confer 
resistance to a BRAF inhibitor. The top hits included genes previously shown to be able to confer resistance, and novel 
candidates were validated using individual sgRNA and complementary DNA overexpression. A gene expression 
signature based on the top screening hits correlated with markers of BRAF inhibitor resistance in cell lines and 
patient-derived samples. These results collectively demonstrate the potential of Cas9-based activators as a powerful 


genetic perturbation technology. 


Achieving systematic, genome-scale perturbations within intact biolog- 
ical systems is important for elucidating gene function and epigenetic 
regulation. Genetic perturbations can be broadly classified as either loss- 
of-function (LOF) or gain-of-function (GOF) on the basis of their mode 
of action. To date, various genome-scale LOF screening methods have 
been developed, including approaches employing RNA interference’* 
and the RNA-guided endonuclease Cas9 from the microbial CRISPR 
(clustered regularly interspaced short palindromic repeat) adaptive 
immune system**. Genome-scale GOF screening approaches have largely 
remained limited to the use of cDNA library overexpression systems. 
However, it is difficult to capture the complexity of transcript isoform 
variance using these libraries, and large cDNA sequences are often dif- 
ficult to clone into size-limited viral expression vectors. The cost and 
complexity of synthesizing and using pooled cDNA libraries have also 
limited their use. Novel technologies that overcome such limitations 
would enable systematic, genome-scale GOF perturbations at endog- 
enous loci. 

Programmable DNA-binding proteins have emerged as an exciting 
platform for engineering synthetic transcription factors for modulat- 
ing endogenous gene expression* ''. Among the established custom 
DNA-binding domains, Cas9 is most easily scaled to facilitate genome- 
scale perturbations” owing to its simplicity of programming relative to 
zinc finger proteins and transcription activator-like effectors (TALEs). 
Cas9 nuclease can be converted into an RNA-guided transcription acti- 
vator (dCas9-activator) via inactivation of its two catalytic domains’*”* 
and fusion to transcription activation domains. These dCas9-activator 
fusions targeted to the promoter region of endogenous genes can then 
modulate gene expression’""’. Although the current generation of dCas9- 
based transcription activators is able to achieve upregulation of some 
endogenous loci, the magnitude of transcriptional upregulation achieved 
by individual single-guide RNAs (sgRNAs)”* typically ranges from low 


to ineffective*’. Tiling a given promoter region with several ssRNAs 
can produce more robust transcriptional activation’ "', but this require- 
ment presents enormous challenges for scalability, and in particular for 
establishing pooled, genome-wide GOF screens. 

To improve and expand the applications of Cas9, we recently under- 
took crystallographic studies to elucidate the atomic structure of the 
Cas9-sgRNA-target DNA tertiary complex”, thus enabling rational engi- 
neering of Cas9 and sgRNA. Here we report a series of structure-guided 
engineering efforts to create a potent transcription activation complex 
capable of mediating robust upregulation with a single ssRNA. Using 
this new activation system, we demonstrate activation of endogenous 
genes as well as non-coding RNAs, elucidate design rules for effective 
sgRNA target sites, and establish and apply genome-wide dCas9-based 
transcription activation screening to study drug resistance ina melanoma 
model. These results collectively demonstrate the broad applicability of 
CRISPR-based GOF screening for functional genomics research. 


Structure- guided design of Cas9 complex 

Transformation of the Cas9-sgRNA complex into an effective tran- 
scriptional activator requires finding optimal anchoring positions for 
the activation domains. Previous designs of dCas9-based transcription 
activators have relied on fusion of transactivation domains to either the 
amino or carboxy terminus of the dCas9 protein. To explore whether 
alternate anchoring positions would improve performance, we exam- 
ined our previously determined crystal structure of the Streptococcus 
pyogenes dCas9(D10A/H840A) in complex with a single-guide RNA 
(sgRNA) and complementary target DNA™. We observed that the tetra- 
loop and stem loop 2 of the sgRNA protrude outside of the Cas9-sgRNA 
ribonucleoprotein complex, with the distal 4 base pairs (bp) of each 
stem completely free of interactions with Cas9 amino acid side chains 
(Extended Data Fig. 1a). On the basis of these observations, along with 
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functional data demonstrating that substitutions and deletions in the 
tetraloop and stem loop 2 regions of the sgRNA sequence do not affect 
Cas9 catalytic function’ (Fig. 1a), we reasoned that the tetraloop and 
stem loop 2 could tolerate the addition of protein-interacting RNA 
aptamers to facilitate the recruitment of effector domains to the Cas9 
complex (Fig. 1b). 

Weselected a minimal hairpin aptamer, which selectively binds dimer- 
ized MS2 bacteriophage coat proteins’*, and appended it to the sgRNA 
tetraloop and stem loop 2 (Extended Data Fig. 1b). We next tested 
whether MS2-mediated recruitment of VP64 to the tetraloop and stem 
loop 2 could mediate transcriptional upregulation more efficiently than 
a dCas9-VP64 fusion. As predicted, aptamer-mediated recruitment of 
MS2-VP64 to either tetraloop (sgRNA1.1) or stem loop 2 (sgRNA1.2) 
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Figure 1 | Structure-guided design and optimization of an RNA-guided 
transcription activation complex. a, A crystal structure of the Cas9-sgRNA- 
target DNA ternary complex (PDB ID: 4008)" reveals that the sgRNA 
tetraloop and stem loop 2 are exposed. b, Schematic of the three-component 
SAM system. c, Design and optimization of sgRNA scaffolds for optimal 
recruitment of MS2-VP64 transactivators in Neuro-2a cells. d, MS2 stem loop 
placement within the sgRNA significantly affects transcription activation 
efficiency. e, Combinations of different activation domains act in synergy to 
enhance the level of transcription activation. f, Addition of the HSF1 
transactivation domain to MS2-p65 further increases the efficiency of 
transcription activation. Experiments for d-f were performed in 293FT cells. 
All values are mean + s.e.m. with n = 3 biological replicates. *P < 0.05 based 
on Student’s t-test. 
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mediated three- and fivefold higher levels of Neurog2 upregulation 
than a dCas9-VP64 fusion (sgRNA 1.0), respectively. Recruitment of 
VP64 to both positions (sgRNA 2.0) resulted in an additive effect, lead- 
ing to a 12-fold increase over dCas9-VP64 (sgRNA 1.0). Combining 
sgRNA 2.0 with dCas9-VP64 instead of dCas9 provided an additional 
1.3-fold increase in Neurog2 upregulation (Fig. 1c). We further com- 
pared sgRNA 2.0 to a sgRNA previously described bearing two MS2- 
binding stem loops at the 3’ end (sgRNA + 2MS2)" and found that 
sgRNA 2.0 drove 14- and 8.5-fold higher levels of transcription activation 
than sgRNA + 2X MS2 for ASCL1 and MYOD1, respectively (Fig. 1d). 
This difference could be due to either improved positioning of MS2 
stem loops or to dCas9 protection of internal MS2 stem loops from 
exonuclease degradation. 

To further improve the potency of Cas9-mediated gene activation, 
we considered how transcriptional activation is achieved in natural 
contexts, where endogenous transcription factors generally act in syn- 
ergy with co-factors'®. We thus hypothesized that combining VP64 
with additional, distinct activation domains could improve activation 
efficiency. We chose the NF-«B trans-activating subunit p65 that, while 
sharing some common co-factors with VP64, recruits a distinct subset 
of transcription factors and chromatin remodelling complexes. For 
example, p65 has been shown to recruit AP-1, ATF/CREB and SP1 
(ref. 17), whereas VP64 recruits PC4 (ref. 18), CBP/p300 (ref. 19), and 
the SWI/SNF complex”’. 

We then varied the effector domain fused to dCas9 or MS2. Hetero- 
effector pairing of dCas9 and MS2 fusion proteins (for example, dCas9- 
VP64 paired with MS2-p65 or dCas9-p65 with MS2-VP64) provided 
over 2.5-fold higher transcription activation for both ASCL1 and MYOD1 
than homo-effector pairing (for example, dCas9-V P64 paired with MS2- 
VP64 or dCas9-p65 with MS2-p65) (Fig. le). We further explored this 
concept of domain synergy by introducing the activation domain from 
human heat-shock factor 1 (HSF1)”’ as a third activation domain, and 
found that an MS2-p65-HSF1 fusion protein further improved tran- 
scriptional activation of ASCL1 (12%) and MYOD1 (37%) (Fig. 1f). 
Additional modifications to the sgRNA as well as Cas9 protein, includ- 
ing varying the nuclear localization signal (NLS), provided only minor 
improvements (Extended Data Fig. 1c—e). On the basis of these collective 
results, we concluded that the combination of sgRNA 2.0, NLS-dCas9- 
VP64 and MS2-p65-HSF1 comprises the most effective transcription 
activation system, and designated it synergistic activation mediator (SAM). 
For simplicity, we will refer to sgRNA 2.0 as sgRNA in subsequent dis- 
cussions, unless noted otherwise. 


Design rules for efficient sgRNAs 


To evaluate thoroughly the effectiveness of SAM for activating endog- 
enous gene transcription, we chose 12 genes that were previously found 
by several groups to be difficult to activate using dCas9-V P64 and indi- 
vidual sgRNA 1.0 guides*’®"". For each gene, we selected 8 sgRNA target 
sites spread across the proximal promoter between — 1,000 bp and the 
+1 transcription start site (TSS). For 9 out of 12 genes, the maximum 
level of activation achieved using dCas9-VP64 with any of the 8 ssRNA 
1.0 guides was lower than twofold, while the remaining three genes 
(ZFP42, KLF4 and IL1B) were maximally activated between two- and 
fivefold (Fig. 2a). In contrast, SAM stimulated transcription at least 
twofold for all genes and more than 15-fold for 8 out of 12 genes. SAM 
performed consistently better than sgRNA 1.0 + dCas9-VP64 for all 
96 guides, with a median gain of 105-fold greater upregulation across 
all 12 genes (activation by SAM divided by activation by sgRNA 1.0 + 
dCas9-VP64). 

Previous studies have demonstrated that the poor activation effi- 
ciency of single ssRNAs can be overcome by combining dCas9-VP64 
with a pool of sgRNAs tiling the proximal promoter region of the target 
gene” ''. Therefore we compared the single sgRNA activation efficiency 
of SAM against dCas9-VP64 combined with a pool of 8 sgRNA 1.0 
guides, all targeting the same gene. For 10 out of 12 genes, SAM with a 
single sgRNA performed more robustly than dCas9-VP64 with pools 
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Figure 2 | Characterization of SAM-mediated gene and lincRNA activation 
and derivation of selection rules for efficient sgRNAs. a, Fold activation of 
12 different genes plotted against the sgRNA location. sgRNA 1.0 with 
dCas9-VP64 (grey), sgRNA 2.0 with dCas9-VP64 and MS2-p65-HSF1 (blue). 
b, Comparison of activation efficiency of 12 target genes: dCas9-VP64 and 
asingle sgRNA 1.0; dCas9-VP64 with a single sgRNA 2.0 and MS2-p65-HSF1, 
and dCas9-VP64 with a mixture of 8 sgRNA 1.0s. c, Efficiency of target gene 


of 8 sgRNA 1.0 guides (Fig. 2b). Additionally, inclusion of a third 
activation domain, MS2-p65-HSF1 or MS2-p65-MyoD1, outper- 
formed MS2-p65 alone (Extended Data Fig. 2a). 

Next, we sought to determine factors that contribute to inter- and 
intragenic variability of activation efficiency by different ssRNAs. For 
inter-gene variability, differences in activation magnitudes could be due 
to epigenetic factors and/or variation in basal transcription levels. We 
were thus interested in correlating basal transcription with the level of 
transcription activation achieved using SAM. Using the relative tran- 
scriptional levels of target genes in control samples, we observed a highly 
significant correlation between the inverse of basal transcript level and 
the fold upregulation achieved using SAM (Fig. 2c; r = 0.94, P< 0.0001). 
This suggests that the basal expression level of each gene largely deter- 
mines the level of activation. 

To study the intragenic variability of SAM activity, we aggregated 
the activation data for all 96 guides and found the distance between the 
guide RNA target site and the TSS to be the strongest predictor of acti- 
vation efficiency (Fig. 2d; r = 0.67, P < 0.0001). For all genes, the highest 
levels of activation were consistently achieved by targeting within the 
—200 bp to +1 bp window. This simple design guideline can inform 
the selection of efficient sgRNAs for gene activation. 

We also sought to test whether SAM is able to activate non-coding 
elements in addition to protein-coding genes. We chose a diverse set 
of 6 lincRNAs and found that SAM mediated significant upregulation 
of each target (Fig. 2e), with MS2-p65-HSF1 or MS2-p65-MyoD1 
leading to the highest levels of activation for each lincRNA (P < 0.01) 
(Extended Data Fig. 3). We also examined the effect of the most potent 
sgRNA for each lincRNA on the transcription of the nearest coding 
gene. Of all sgRNAs tested, only the sgRNA targeting HOTTIP—the 
only sgRNA located within 500 bp of the neighbouring gene’s TSS—led 
to significant activation of its neighbour (Extended Data Fig. 2b). 


Multiplex gene activation 

The ability to simultaneously modulate gene expression at multiple loci 
would allow for a better understanding of complex genetic and regula- 
tory networks. Using sets of two to ten sgRNAs, we observed successful 
activation of all target genes (>twofold) within all sgRNA combinations 
(Fig. 3a, b and Extended Data Fig. 4). As expected, most genes (exclud- 
ing IL1R2) exhibited a decrease in the amount of upregulation achieved 
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activation as a function of baseline expression levels. d, Correlation of gene 
activation efficiency with sgRNA targeting position. Activation efficiency of 
each sgRNA for the same target gene is normalized against the highest- 
activating sgRNA. e, Fold activation of six lincRNA transcripts by SAM (best 
sgRNA out of 8 tested). All experiments were performed in 293FT cells. All 
values are mean ~ s.e.m. with n = 3 biological replicates. 


when concurrently targeted with 9 other genes. Interestingly, the relative 
activation levels of each gene changed between multiplex activation 
and single-gene activation experiments (Fig. 3a, b). 

We asked if reduced activation of targets during multiplexing was 
due to the reduced amounts of sgRNA or SAM protein components. 
Surprisingly, diluting the ssRNA expression plasmid by tenfold in single- 
gene activation experiments did not reduce activation for all genes 
(Fig. 3c). We found that genes whose levels of activation are reduced 
upon sgRNA dilution also exhibited dampened levels of activation when 
multiplexed (Fig. 3d; r= 0.94, P< 0.001). In contrast, the activation 
efficiency of SAM was generally unperturbed by dilution of its protein 
components (dCas9-V P64 and MS2-p65-HSF1) (Extended Data Fig. 5). 
Activation efficiency remained stable particularly when all three com- 
ponents were diluted, retaining on average 90% activation efficiency 
across a 50-fold dilution range (Extended Data Fig. 5). This finding 
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Figure 3 | Simultaneous activation of endogenous genes using multiplexed 
sgRNA expression. a, Activation of individual genes by single sgRNAs with 
dCas9-VP64 and MS2-p65-HSF1. b, Simultaneous activation of the same ten 
genes using a mixture of ten sgRNAs, each targeting a different gene. c, Effect of 
sgRNA dilution on gene activation efficiency. d, Correlation between the 
activation efficiency of a single tenfold diluted sgRNA and that of the same 
sgRNA delivered within a mixture of ten different-gene targeting sgRNAs. All 
values are mean ~ s.e.m. with n = 3 biological replicates. 
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was particularly promising for genome-scale pooled screening appli- 
cations, which rely on single-copy lentiviral integration. 


Specificity of SAM-mediated activation 


An important consideration for SAM use is its targeting specificity. 
Recent analysis of genome-wide dCas9-binding revealed significant 
concentration-dependent off-target binding”, yet its effect on the spec- 
ificity of transcription modulation remains unclear. To assess SAM spec- 
ificity, we chose HBG1/2 as our target gene, reasoning that globin genes 
would have few downstream targets that could confound our specifi- 
city analysis. We found that SAM specifically activated both HBG1 and 
HBG2 isoforms (P < 0.05, t-test after 0.01 false discovery rate (FDR) 
correction), which share the same TSS (Fig. 4 and Extended Data Fig. 6). 
Wealso tested two additional non-targeting sgRNAs with guide sequences 
that do not share perfect homology with the human genome. For all 
sgRNAs, we found only two additional genes, S100A1 and CYB5R2, to 
be differentially expressed (P < 0.05, t-test after 0.01 FDR correction for 
multiple hypothesis testing) compared with green fluorescent protein 
(GFP)-expressing control (Extended Data Fig. 6) for both non-targeting 
guides. These results suggest that SAM-mediated gene activation is spe- 
cific with minimal off-target activity. 


Genome-scale gene activation screen 

The ability to activate target genes using individual sgRNAs greatly facil- 
itates the development of pooled, genome-scale transcriptional activa- 
tion screening. To develop a SAM-based screening system, we generated 
lentiviral expression vectors that are able to drive robust transcription 
activation at low multiplicity of infection (MOI) (Extended Data Fig. 7a, b). 
Using this lentiviral system, we generated a genome-scale sgRNA library 
consisting of 70,290 guides, targeting every coding isoform from the 
RefSeq database (23,430 isoforms). For each gene, three sgRNAs were 
chosen to target sites within 200 bp upstream of the TSS, which was 
previously determined to provide more efficient activation (Fig. 2d and 
Fig. 5a). 

Previously we applied genome-scale CRISPR knockout (GeCKO) 
screening’ in A375 (BRAF(V600E)) melanoma cells to identify LOF 
mutations capable of mediating resistance against the BRAF inhibitor 
PLX-4720. Here we sought to use the new SAM sgRNA library to identify 
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Figure 4 | Evaluation of SAM specificity. Expression levels in log,(transcripts 
per million (TPM)) values of all detected genes in RNA-seq libraries of GFP- 
transfected controls (x axis of all graphs) compared to (from left to right): 
SAM targeting HBGI/2 genes in 1X dilution and 50x dilution, non-targeting 
control sgRNAs in 1X dilution and 50X dilution (y axis). Marked are the 
two most statistically significant differentially expressed genes (t-test q 

value < 0.05 with FDR correction): Red, HBG1; blue, HGB2. The average from 
n = 3 biological replicates is shown. 
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Figure 5 | Genome-scale gene activation screening identifies mediators of 
BRAF inhibitor resistance. a, Flow-chart of transcription activation screening 
using SAM. Blast, blasticidin; Hygro, hygromycin; Puro, puromycin; PLX, 
PLX-4720; Zeo, zeocin. b, Box-plot showing the distribution of sgRNA 
frequencies post lentiviral transduction for baseline (day 3), vehicle (day 21), 
and PLX-4720 (day 21) conditions from n = 2 infection replicates. 

c, Scatterplot showing enrichment of specific sgRNAs after PLX-4720 
treatment. d, Identification of top candidate genes using the RIGER P value 
analysis based on the average of both infection replicates. e, Comparison of 
RIGER P values for the top 100 hits from SAM and GeCKO? PLX-4720 
resistance screens. f, Consistency of sgRNAs for top screening hits. Fraction 
of unique sgRNAs targeting each gene that are in the top 5% of all sgRNAs 
is plotted. 


a complementary set of GOF changes that can confer BRAF inhibitor 
resistance (Fig. 5a). 

We found that at 14 days post drug treatment, the sgRNA distribu- 
tion was significantly different between cells treated with PLX-4720 and 
with vehicle, with the majority of sgRNAs exhibiting a reduced repre- 
sentation and a small set of guides showing high enrichment in PLX- 
4720-treated cells (Fig. 5b and Extended Data Fig. 7c). For a number of 
gene targets, multiple ssRNAs targeting the same gene were enriched 
in PLX-4720-treated cells (Fig. 5c) and the 10 most significant hits were 
distributed throughout the genome (Fig. 5d and Extended Data Fig. 7d). 
The significance of the P values of our top 100 hits determined by RNAi 
gene enrichment ranking (RIGER) (Supplementary Tables 1 and 2) was 
comparable to those observed for GeCKO screening’ (Fig. 5e). In addi- 
tion, for the top 10 shared hits between two independent screens (zeocin 
and puromycin selection for sgRNA expression), the fraction of effec- 
tively enriched guides per gene (present in the top 5% of all guides) was 
very high with 97% for zeocin and 81% for puromycin (89% + 10.7% 
overall, compared to 78% + 27% for the top 10 GECKO hits, Fig. 5fand 
Extended Data Fig. 7e). 

Our screen results highlight a number of gene candidates that both 
confirm known PLX-4720 resistance pathways and suggest new mech- 
anisms (Extended Data Fig. 7f). First, reactivation of the ERK pathway 
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Figure 6 | Validation of top hits from genome-scale gene activation screen 
for PLX-4720 resistance mediators. a, Comparison of PLX-4720 resistance, 
transcription activation and protein upregulation in A375 cells for top 
screening hits. b, Expression levels of top hits and screen signatures are elevated 
in the resistant state of short-term BRAF(V600) melanoma cultures (see 


is one of the main known resistance mechanisms”*™*, and two of our 


screening hits, BCAR3 and EGFR, probably modulate downstream and 
upstream nodes of this pathway, respectively**”°. EGER has been previ- 
ously validated as a mediator of resistance to PLX-4720 through PI3K- 
AKT, in addition to ERK”*”’”. These two pathways are thought to be 
alternative routes of PLX-4720 resistance**”*”?, Furthermore, four out 
of the top 10 hits from our screen belong to the family of G-protein- 
coupled receptors (GPCRs: GPR35, LPARI, LPAR5 and P2RY8), which 
emerged as the top-ranked protein class conferring resistance to mul- 
tiple MAP kinase inhibitors in melanoma cells in a recent screen using 
cDNA overexpression*’. GPCRs signal through multiple downstream 
pathways including ERK and AKT as well as cAMP-PKA*". The final 
class of protein candidates from our screen belongs to the ITG receptor 
family, which is thought to interact with RTK and activate both ERK 
and AKT pathways****. 

To verify the results from the PLX-4720 resistance screen, we vali- 
dated each of the top 13 genes. All sgRNAs from the screen that tar- 
geted these 13 genes conferred PLX-4720 resistance when individually 
expressed in A375 along with SAM (Fig. 6a and Extended Data Fig. 8a). 
Wealso verified that SAM was able to facilitate robust increase in target 
transcript (Fig. 6a and Extended Data Fig. 8b) and protein levels (Fig. 6a). 
Since 5 of our top candidates from the pooled SAM screen overlapped 
with hits from a previously conducted arrayed cDNA screen” (Extended 
Data Fig. 8c), we compared the relative efficacy of cDNA overexpres- 
sion with SAM-mediated transcription activation. Interestingly, for 
these 5 targets, SAM led to at least similar levels of PLX-4720 resistance 
when compared with corresponding cDNA overexpression conditions 
(Extended Data Fig. 8a), despite cDNA leading to higher transcript 
levels (Extended Data Fig. 8d). Furthermore, we found that, for most 
genes, the levels of PLX-4720 resistance mediated by all three sgRNAs 
were comparable (Extended Data Fig. 8e). 

In addition to validating our top screening hits through individual 
sgRNA or cDNA overexpression, we analysed the expression profile 
of our screening hits using four different data sets (CCLE**”, TCGA: 
https://tcga-data.nci.nih.gow/tcga/, short-term melanoma cultures*’, and 
pre/post treatment patient samples**). As shown previously”’, a distinct 
transcriptional state defines BRAF-inhibition sensitive and resistant 
states as described by activation of endogenous MITF/associated mar- 
kers (for example, PMEL) and NF-«B-pathway activity/associated mar- 
kers (for example, AXL), respectively (Fig. 6b and Extended Data Fig. 9b). 
Based on short-term melanoma data*®*’, we found that the expression 
of our top screening hits was significantly increased in the resistant 
state. Correspondingly, a gene expression signature based on the top 
screening hits (see Methods) correlated with a BRAF-inhibitor resist- 
ance state as defined previously”? (Fig. 6b; total overlap, P< 0.0001). 
Further analysis performed using the CCLE, TCGA and pre/post treat- 
ment data set also revealed similar correlations (Extended Data Fig. 9). 


Methods for signature generation). The subset of samples which were 
previously tested for PLX-4720 sensitivity and resistance are indicated by blue 
and red arrows, respectively”’. IC, information coefficient. All values are 
mean + s.e.m. with n = 4 biological replicates. 


Discussion 


In summary, we have taken a structure-guided approach to design a 
dCas9-based transcription activation system for achieving robust, single 
sgRNA-mediated gene upregulation. By engineering the sgRNA to incor- 
porate protein-interacting aptamers, we assembled a synthetic transcrip- 
tion activation complex consisting of multiple distinct effector domains 
modelled after natural transcription activation processes. Here we have 
shown that the SAM system is robust, specific, and can facilitate genome- 
scale gain-of-function screening when combined with a compact pooled 
sgRNA library. Our SAM-mediated screens exhibited a high degree of 
consistency and validation, with >80% effectively enriched guides per 
gene hit, and 100% validation of the top 10 hits. 

Future engineering of the Cas9 complex based on structural informa- 
tion'*” will further expand the Cas9 toolbox“. Additional developments 
of the SAM system may be able to take advantage of the modularity 
and customizability of the sgRNA scaffold to establish a series of sgRNA 
scaffolds bearing different aptamers for recruiting distinct types of 
effectors in an orthogonal manner. For instance, replacement of the 
MS2 stem loops with PP7-interacting stem loops may be used to recruit 
repressive elements, potentially enabling multiplexed bidirectional tran- 
scriptional control. 

Although we have taken initial steps towards defining selection rules 
for potent sgRNAs, future studies will reveal additional selection criteria 
that are critical for guide efficacy, such as sequence-intrinsic properties 
(Extended Data Fig. 10a—d). Applications of dCas9-based transcription 
modulators in positive and negative selection screens (Extended Data 
Fig. 10e, f)*” will enable the dissection of many types of genetic elements, 
ranging from protein-coding genes to non-coding lincRNA elements. 
Furthermore, combining wild-type Cas9-mediated genome modifica- 
tions with SAM-mediated recruitment of epigenetic modifiers will con- 
stitute powerful approaches for studying genome organization and 
regulation in diverse biological processes. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Sequences. DNA sequences for SAM components and sgRNA scaffolds are pro- 
vided in Supplementary Sequences. sgRNA target sequences for characterization 
and optimization of SAM are listed in Supplementary Table 4. 

Transient transfection experiments. Neuro-2a cells (Sigma-Aldrich) were grown 
in media containing 1:1 ratio of OptiMEM (Life Technologies) to high-glucose 
DMEM with GlutaMAX and sodium pyruvate (Life Technologies) supplemented with 
5% HyClone heat-inactivated FBS (Thermo Scientific), 1% penicillin/streptomycin 
(Life Technologies), and passaged at 1:5 every 2 days. 

HEK293FT cells (Life Technologies) were maintained in high-glucose DUEM 
with GlutaMAX and sodium pyruvate (Life Technologies) supplemented with 10% 
heat-inactivated characterized HyClone fetal bovine serum (Thermo Scientific) and 
1% penicillin/streptomycin (Life Technologies). Cells were passaged daily at a ratio 
1:2 or 1:2.5. For gene activation experiments, 20,000 HEK293FT cells per well were 
plated in 100 pl media in poly-p-lysine-coated 96-well plates (BD BioSciences). 
24h after plating, cells were transfected with a 1:1:1 mass ratio of ssRNA plasmid 
with gene-specific targeting sequence or pUC19 control plasmid:MS2-effector plas- 
mid or pUC19:dCas9 plasmid, dCas9-effector plasmid or pUC19. 

A total plasmid mass of 0.3 jg per well was transfected using 0.6 ul per well of 
Lipofectamine 2000 (Life Technologies) according to the manufacturer’s instruc- 
tions. Culture medium was changed 5h after transfection. 48 h after transfection, 
cell lysis and reverse transcription were performed using a Cells-to-Ct kit (Life 
Technologies). Relative RNA expression levels were quantified by reverse transcrip- 
tion and quantitative PCR (qPCR) using TaqMan qPCR probes (Life Technologies, 
Supplementary Table 5) and Fast Advanced Master Mix (Life Technologies). qPCR 
was carried out in 5 j1l multiplexed reactions and 384-well format using a LightCycler 
480 Instrument II. Data was analysed by the AAC, method: target C, values (FAM 
dye) were normalized to GAPDH C;, values (VIC dye), and fold changes in target 
gene expression were determined by comparing to GFP-transfected experimental 
controls. 

Lentivirus production. HEK293T cells (Life Technologies) were cultured as 
described above for HEK293FT cells. 1 day before transfection, cells were seeded 
at ~40% confluency (12 X T225 flasks for library scale production, 1 X T25 flask 
for individual guide production). Cells were transfected the next day at ~80-90% 
confluency. For each flask, 10 jg of plasmid containing the vector of interest, 10 pig 
of pMD2.G and 151g of psPAX2 (Addgene) were transfected using 100 ul of 
Lipofectamine 2000 and 200 kl Plus Reagent (Life Technologies). 5 h after transfec- 
tion the media was changed. Virus supernatant was harvested 48 h post-transfection, 
filtered with a 0.45-um PVDF filter (Millipore), aliquoted, and stored at —80 °C. 
Lentiviral transduction. A375 cells (Sigma) were cultured in RPMI 1640 (Life Tech- 
nologies) supplemented with 10% FBS (Seradigm) and 1% penicillin/streptomycin 
(Life Technologies) and passaged every other day at a 1:4 ratio. Cells were trans- 
duced with lentivirus via spinfection in 12-well plates. 3 X 10° cells in 2 ml of media 
supplemented with 8 ,g ml! polybrene (Sigma) were added to each well, supple- 
mented with lentiviral supernatant and centrifuged for 2h at 1,000g. 24h after 
spinfection, cells were detached with TrypLE (Life Technologies) and counted. Cells 
were replated at low density (7.5 X 10° cells per T225 Flask) and a selection agent 
was added either immediately (zeocin, blasticidin and hygromycin, all Life Tech- 
nologies) or 3 h after plating (puromycin). Concentrations for selection agents we 
determined using a kill curve: 0.5 1g ml ' puromycin, 200 pg ml? zeocin, 10 tg ml? 
blasticidin, and 300 pg ml ' hygromycin. Media was refreshed on day 2 and cells 
were passaged every other day starting on day 4 after replating. The duration of 
selection was 4 days for puromycin and 7 days for zeocin, hygromycin and blas- 
ticidin. Lentiviral titres were determined by spinfecting cells with 6 different volumes 
of lentivirus ranging from 0 to 600 pl and counting the number of surviving cells 
after a complete selection (3-6 days). 

Design and cloning of SAM library. RefSeq coding gene isoforms with a unique 
TSS (total of 23,430 isoforms) were targeted with three guides each for a total library 
of 70,290 guides (Supplementary Table 6). Guides were designed to target the first 
200 bp upstream of each TSS and subsequently filtered for GC content >25% and 
minimal overlap of the target sequence. After filtering, the remaining guides were 
scored according to predicted off-target matches as described previously”, and 
three guides with the best off-target scores were selected. Cloning of the SAM 
sgRNA libraries was performed as previously described’ with a minimum repres- 
entation of 100 transformed colonies per guide. 

Depletion and PLX-4720 screen. A375 cells stably integrated with SAM Cas9 
and effector components were transduced with SAM sgRNA libraries as described 
above at an MOI of 0.2, with a minimal representation of 500 transduced cells per 
guide. Cells were maintained at >500 cells per guide during subsequent passaging. 
At 7 days post infection (DPI) (complete selection, see above), cells were split into 
vehicle (DMSO) and PLX-4720 conditions (2 1M PLX-4720 dissolved in DMSO, 
Selleckchem). Cells were passaged every 2 days for a total of 14 days of drug treatment. 
>500 cells per guide were harvested as a baseline at 3 DPI (4 days before treatment) 
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and at 21 DPI (after 14 days of treatment) for gDNA extraction. Genomic DNA 
was extracted using the Zymo Quick-gDNA midi kit (Zymo Research). PCR of the 
virally integrated guides was performed on gDNA at the equivalent of >500 cells 
per guide in 96 parallel reactions using NEBnext High Fidelity 2X Master Mix 
(New England Biolabs) in a single-step reaction of 22 cycles. Primers are listed 
here: forward primer, AATGATACGGCGACCACCGAGATCTACACTCTTTC 
CCTACACGACGCTCTTCCGATCTNNNNNNNN(1-10-bp stagger)GCTTTAT 
ATATCTTGTGGAAAGGACGAAACACC, 8 bp barcode indicated in italic; reverse 
primer, CAAGCAGAAGACGGCATACGAGATNNNNNNNNGTGACTGGAG 
TTCAGACGTGTGCTC TTCCGATCTGCCAAGTTGATAACGGACTAGCCTT, 
8-bp index read barcode indicated in italic. 

PCR products from all 96 reactions were pooled, purified using Zymo-Spin V 

with Reservoir (Zymo research) and gel extracted using the Zymoclean Gel DNA 
Recovery Kit (Zymo research). Resulting libraries were deep-sequenced on Illumina 
MiSeq and HiSeq platforms with a total coverage of >35 million reads passing 
filter per library. 
NGS and screen hits analysis. NGS data were de-multiplexed using unique index 
reads. Guide counts (Supplementary Table 7) were determined based on perfectly 
matched sequencing reads only. For each condition, guide counts were normalized 
to the total number of counts per condition, and log, counts were calculated based 
on these values. Ratios of counts between conditions were calculated as log>((count 
1+ 1)/(count 2 + 1)) based on normalized counts. 

RIGER analysis was performed using GENE-E based on the normalized day 14 

log, ratios (PLX-4720/DMSO) averaged over two independent infection replicates. 
All RIGER analysis used the Kolmogorov-Smirnov method as described previously“, 
except for Fig. 6c, where the weighted average method was used in order to enable 
comparison to GeCKO values determined by that method. 
Gene expression and pharmacological validation analysis. Gene expression data 
(CCLE, TCGA, short-term cultures, patient melanoma biopsies) and pharmacological 
data (CCLE, short-term cultures) were analysed to better understand the biological 
relevance of the top gene hits from the SAM screens. In the CCLE data set**, gene 
expression data (RNA-sequencing, GCHub: https://cghub.ucsc.edu/datasets/ccle.html) 
and pharmacological data (activity area for MAPK pathway inhibitors) from 
BRAF(V600) mutant melanoma cell lines were used to compute the association 
between PLX-4720 resistance and the gene expression of each of the top hits. Addi- 
tionally, gene expression signatures comprised of the top hits were generated using 
single-sample Gene Set Enrichment Analysis (ssGSEA)*, and the associations between 
PLX-4720 resistance and these signatures were computed. 

Gene expression data (Affymetrix GeneChip HT-HGU133) and PLX-4720 phar- 
macological data (GIs. half-maximal growth inhibition concentration; only for a 
subset of the samples) from short term melanoma cultures (STC)** were also used 
for plotting the gene expression of top hits and their ssGSEA signature scores. 
Expression data for the STC samples were collapsed to maximum probe value per 
gene and pre-processed using robust spline normalization. 

Gene expression (RNA-sequencing) and genotyping data were collected from 
113 BRAF(V600)-mutant primary and metastatic patient tumours from The Cancer 
Genome Atlas (https://tcga-data.nci.nih.gov/tcga/) and these data were similarly 
used for determining the association between resistance and the expression of top 
hits/ssGSEA signature scores. Because pharmacological data was not available for 
the STCs (only a subset had PLX-4720 data) and the TCGA melanoma samples, a 
transcriptional state was plotted using marker genes and signatures” in order to 
identify samples resistant to BRAF-inhibition. 

Gene expression data from 13 patients with BRAF(V600E) melanomas** was 
used for analysing the relationship between resistance and the expression of our 
top hits/ssGSEA signature scores. Because all the post-treatment tumours were 
resistant and not every sample had a paired on-treatment biopsy, we decided to 
order the samples by MITF expression in the pre-treatment samples to reflect the 
original PLX-4720 sensitivity state of the tumours. We then used the expression 
data in the post-treatment resistant tumours to plot the expression of top hits/ 
ssGSEA signature scores. We also calculated the log>-fold change between each 
patient’s post/pre paired samples and determined the number of patients that had 
at least a log,-fold change of 2 per top screen hit. 

Single sample gene set enrichment analysis. While there was a significant asso- 
ciation between the overexpression of some of our top individual SAM screen hits 
and resistance in three external cancer data sets, we sought a more robust scoring 
system independent of any single gene. Gene expression signatures were generated 
based on the set of top hits from each of the two SAM screens and for the overlap 
between them. Using single-sample Gene Set Enrichment analysis (ssGSEA), a 
score was generated for each sample that represents the enrichment of the SAM 
screen gene expression signature in that sample and the extent to which those genes 
are coordinately up- or downregulated. Additionally, signature gene sets from the 
Molecular Signature Database (MSigDB)** were used in order to fully map the 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


transcriptional BRAF-inhibitor resistant/sensitive states in the short-term culture 
and TCGA data sets as previously described”. 

Information coefficient for measuring associations in external data sets. To 
measure correlations between different features (signature scores, gene expression, 
or drug-resistance data) in the external cancer data sets, an information-theoretic 
approach (Information Coefficient; IC) was used and significance was measured 
using a permutation test (n = 10,000), as previously described”’. The IC was cal- 
culated between the feature used to sort the samples (columns) in each data set and 
each of the features plotted in the heatmap (pharmacological data, gene expression, 
and signature scores). 

sgRNA sequence analysis. Depletion for each sgRNA was calculated as the ratio 
of counts (see “NGS and screen hits analysis”) between day 3 and day 21. The 
sgRNAs corresponding to genes with significant depletion (P < 0.05 by RIGER anal- 
ysis) in sgRNA-Puro and sgRNA-Zeo libraries were selected for analyses. These 
sgRNAs were analysed for nucleotide occurrence in the sgRNA sequence, distance 
from TSS, and guide strand relative to transcript orientation. For each variable, the 
correlation and significance with the sgRNA ratio was calculated by ordinary least 
squares linear regression. 

PLX-4720 survival assay. A375 cells stably integrated with dCas9-VP64 and MS2- 
p65-HSF1 were transduced with individual guides from the top screening hits of 
the Zeocin and Puromycin screens (13 genes total, 3 sgRNAs per gene) as well as 
available cDNA at an MOI of <0.2 as described above. Cells were selected for guide 
expression with Zeocin (Life Technologies) for 5 days and replated at low density 
(3 X 10° cells per well in a 96-well plate). A375 cells and A375 cells expressing dCas9- 
VP64 and MS2-p65-HSF1 were plated as controls. Different concentrations of 
PLX-4720 (2 1M, 0.5 11M, 0.15 UM) or vehicle (DMSO) were added 3 h after plat- 
ing. Cells were treated with PLX-4720 for 4 days before cell viability was measured 
using CellTiter-Glo Luminescent Cell Viability Assay (Promega). For qPCR quan- 
tification of target gene upregulation, cells were also plated at 5 DPI (3 X 10° cells 
per well in a 96-well plate) and harvested for mRNA 24h after plating. 

Western blot. Protein lysates were prepared with RIPA lysis buffer (Cell Signal- 
ing Technologies) containing a protease inhibitor cocktail (Roche). Samples stan- 
dardized for protein with the Pierce BCA protein assay (Thermo Scientific) were 
boiled at 95 °C for 5 min under reducing conditions (except for GPR35 samples, 
which were incubated at 37 °C for 30 min). After denaturation, samples for prob- 
ing proteins with lower or higher molecular weight were separated by 10-20% or 
4-15% Criterion Tris-HCl gels (Bio-Rad) and electrotransferred onto a 0.2-1m or 
0.45-um polyvinylidene difluoride membrane (Millipore), respectively. Blots were 
blocked with 5% BLOT-QuickBlocker (VWR) and probed with different primary 
antibodies (anti-EGER (rabbit polyclonal, SC-03, Santa Cruz Biotechnology, 1:1,000 
dilution), anti-PCDH7 (rabbit polyclonal, HPA011866, Sigma-Aldrich, 1:1,000 dilu- 
tion), anti-ITGBS5 (rabbit polyclonal, SC-14010, Santa Cruz Biotechnology, 1:500 
dilution), anti-ARHGEF1 (rabbit polyclonal, 11363-1-AP, Proteintech, 1:5,000 dilu- 
tion), anti-BCAR3 (rabbit polyclonal, A301-671A, Bethyl Laboratories, 1:2,000 dilu- 
tion), anti-GPR35 (rabbit polyclonal, 10007660, Cayman Chemical, 1:1,000 dilution), 
anti-TFAP2C (rabbit polyclonal, 2320, Cell Signaling Technology, 1:1,000 dilution, 
2.5% bovine serum albumin, Sigma-Aldrich)) in 2.5% BLOT-QuickBlocker (VWR) 
unless noted otherwise overnight at 4 °C. Blots were then incubated with secondary 
antibody HRP-conjugated goat anti-rabbit IgG (7074, Cell Signaling Technology, 


1:1,000 dilution) and HRP-conjugated GAPDH (rabbit monoclonal, 3683, Cell 
Signaling Technology, 1:2,000 dilution) in 2.5% BLOT-QuickBlocker (VWR) for 
1 hat room temperature. Proteins with molecular weights similar to GAPDH (GPR35 
and TFAP2C) were stripped with Restore Plus Western Blot Stripping Buffer (Thermo 
Scientific) before probing for GAPDH. SuperSignal West Pico and Femto Chemi- 
luminescent Substrates (Thermo Scientific) were used for detection. 

RNA sequencing and data analysis. Samples harvested for RNA sequencing were 
prepped with TruSeq Stranded mRNA Sample Prep Kit (Illumina) and deep- 
sequenced on the Illumina MiSeq platform (>9 million reads per condition). 
Bowtie2“ index was created based on the human hg19 UCSC genome and known 
gene transcriptome, and paired-end reads were aligned directly to this index using 
Bowtie2 with command line options “-q-phred33-quals -n 2 -e 99999999 -125 -I1 
-X 1000 -a -m 200 -p 4—chunkmbs 512”. Next, RSEM v1.27* was run with default 
parameters on the alignments created by Bowtie2 to estimate expression levels. 
RSEM’s gene level expression estimates (tau) were multiplied by 1,000,000 to obtain 
transcript per million (TPM) estimates for each gene, and TPM estimates were trans- 
formed to log-space by taking log,(TPM+ 1). The normalization between libraries 
was tested using an MA plot (mairplot function in Matlab V2013b). Genes were 
considered detected if their transformed expression level was equal to or above 1 
(in logo(TPM-+1) scale). All genes detected in at least one library (out of three 
libraries per condition) were used to construct scatter plots comparing each of the 
six conditions to the control GFP condition, using the average across biological 
replicates with >80% alignment to the hg19 UCSC known gene transcriptome 
(log,(mean(TPM) + 1) value per gene). 

To find differentially expressed genes, we performed Student’s t-test on each of 
the six conditions against the GFP condition. The t-test was run on all genes that 
had expression levels above log.(TPM-+ 1)>2.5 in at least two libraries. This thresh- 
old was chosen as the minimal threshold for which the number of detected genes 
across all libraries was constant. Only genes that were significant (P-value pass 0.01 
FDR correction) and had at least 1.5-fold change were reported and visualized using 
a heat map. 
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Extended Data Figure 1 | Structure-guided engineering of Cas9 sgRNA. 

a, Schematic of the sgRNA stem loops showing contacts between each stem 
loop and Cas9. Contacting amino acid residues are highlighted in yellow. 
Tetraloop and stem loop 2 do not make any contacts with Cas9, whereas stem 
loops 1 and 3 share extensive contacts with Cas9. b, sgRNA 2.0 with MS2 
stem loops inserted into the tetraloop and stem loop 2. ¢, Addition of a second 
NLS or an alternative HNH domain inactivating point mutation in Cas9 
improve efficiency of transcription activation for MYOD1 moderately. 

d, dCas9-V P64 activators exhibit improved performance by recruitment of 


MS2-p65 to the tetraloop and stem loop 2. Addition of an AU flip or extension 
in the tetraloop does not increase the effectiveness of dCas9-mediated 
transcription activation. e, Tetraloop and stem loop 2 are amenable to 
replacement with MS2 stem loops. Base changes from the sgRNA 2.0 scaffold 
are shown at the respective positions, with dashes indicating unaltered bases 
and bases below dashes indicating insertions. Deletions are indicated by 
absence of dashes at respective positions. All figures are n = 3 and 

mean + S.e.m. 
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Extended Data Figure 2 | SAM mediates efficient activation of a panel of 
12 coding genes and 6 lincRNAs. a, Comparison of the activation levels of 
12 genes with dCas9-VP64 in combination with MS2-p65, MS2-p65-HSF1, 
or MS2-p65-MyoD1. MS2-p65-HSF1 mediated significantly higher levels of 
activation than MS2-p65 alone for 9 out of 12 genes. The best guide out of 

8 tested for each gene (Fig. 2a) was used in this experiment. Activation levels for 
each type of MS2-fusion is presented as a percentage relative to the activation 
achieved using MS2-p65. b, Investigation of transcriptional changes in the 


i.) x Vv % & © 
fold upregulation of closest coding transcript 


closest coding transcripts for SAM-mediated activation of 6 lincRNAs. 
Direction of the coding transcript relative to the lincRNA and distance between 
transcription start sites are shown. Only targeting of HOTTIP resulted in a 
significant change in the levels of the closest coding transcript (HOXA13). 
The best guide out of 8 tested for each gene (Fig. 2e) in combination with 
dCas9-VP64 and MS2-p65-HSF1 was used in this experiment. All figures are 
n= 3 and mean + s.e.m. 
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Extended Data Figure 3 | Activation of lincRNAs by SAM. Six lincRNAs, combination with dCas9-VP64. MS2 activators with a combination of 2 
three characterized and three uncharacterized, were targeted using SAM. different domains (MS2-p65-HSF1 or MS2-p65-MyoD1) consistently 
For each lincRNA, 8 sgRNAs were designed to target the proximal promoter _ provided the highest activation for each lincRNA, *P < 0.01 for MS2-p65-HSF1 
region (+1 to —800 bp from the TSS) with 4 different MS2 activators or MS2-p65-MyoD1 versus MS2-p65. n = 3 and mean = s.e.m. is shown. 


(MS2-p65-HSF1, MS2-p65-MyoD1, MS2-p65, and MS2-VP64) in 
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Extended Data Figure 4 | Multiplexed activation using SAM and MS2-p65-HSF1 and dCas9-VP64 were used in this experiment. b, The 
activation of a panel of 10 genes as a function of SAM component dosage. _relative activation efficiency of individual ssRNAs varies depending on the 
a, Activation of a panel of 10 genes by combinations of 2, 4, 6 or 8 sgRNAs target gene and the degree of multiplexing. n = 3 and mean + s.e.m. is shown. 


simultaneously. The mean fold upregulation is shown on a logo scale. 
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Extended Data Figure 5 | The effect of guide and SAM-component dilution 
on target activation. a, The results for dilution of sgRNA 2.0 on target 
activation. b, The result for dilution of sgRNA 1.0 on target activation. # denotes 
an activation of <twofold at 1X guide dilution. ¢, Effect of MS2-p65-HSF1 
and dCas9-V P64 dilution, at 1:1, 1:4, 1:10 and 1:50 of the original dosage for 
each component, on the effectiveness of transcription upregulation. The 
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amount of sgRNA expression plasmid was kept constant. d, Effect of diluting 
all three SAM components (dCas9-VP64, MS2-p65-HSF1, and sgRNA) at 1:4, 
1:10, and 1:50 of the original dosage for each component. Fold upregulation 
is calculated using GFP-transfected cells as the baseline. Error bars indicate 
s.e.m. and n = 3 for all figures. 
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Extended Data Figure 6 | RNA-seq analysis of transcriptome changes 
mediated by SAM. a, A heat map of log(TPM) expression values of all 
statistically significant differentially expressed genes (t-test q value < 0.05 
adjusted with FDR multiple hypothesis correction) found in any of the 
six experimental conditions compared to the GFP-transfected control. 
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b, Expression levels in log(TPM) values of all detected genes in RNA-seq 
libraries of GFP-transfected controls (x-axis of all graphs) compared to 
(from left to right): non-targeting control sgRNA no. 2 in 1X dilution and 
50 dilution (y axis). Marked are HBGI (red) and HGB2 (blue). 
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Extended Data Figure 7 | Genome-scale lentiviral screen using 
puromycin-resistant SAM sgRNA library. a, Design of three lentiviral vectors 
for expressing sgRNA, dCas9-VP64, and MS2-p65-HSF1. Each vector 
contains a distinct selection marker to enable co-selection of cells expressing all 
three vectors. b, Lentiviral delivery of SAM components was tested by first 
generating 293FT cell lines stably integrated with dCas9-VP64 and MS2-p65- 
HSF1, and subsequently transducing these cells with single-gene targeting 
lentiviral ssRNAs at MOI <0.2. Transcription activation efficiency is measured 
4 days post sgRNA lentivirus transduction and selection with zeocin or 
puromycin. Activation is at least as effective as previously observed with 
transient transfection in all three cases. c, Box-plot showing the distribution of 
sgRNA frequencies at different time points post lentiviral transduction with the 
Puromycin library, after treatment with DMSO vehicle or PLX-4720. Two 
infection replicates are shown. d, Identification of top candidate genes using the 
RIGER P value analysis (KS method) based on the average of both infection 


replicates. Genes are organized by positions within chromosomes. e, Overlap 
between the top 20 hits from the zeocin and puromycin screens. Genes 
belonging to the same family are indicated by the same colour. There is a 50% 
overlap between the top hits of each screen as shown in the intersection of the 
Venn diagram. f, Relevant signalling pathways in BRAF inhibitor resistance. 
Reactivation of the Ras-ERK pathway as well as the parallel PI3K-Akt pathway 
have previously been implicated as two alternative resistance mechanisms to 
BRAF inhibitors*”*”*’. Both pathways have been described as stimulating 
proliferation and survival”. BAD, FOXO and p27 are common inhibited 
downstream targets”. Recently, stimulation of the cAMP-CREB pathway by 
GPCRs has been described as a potential additional resistance mechanism””. 
Top candidates from our screen are indicated in blue and putative connections 
to all three pathways are shown”**°*', Candidates previously validated to 
mediate PLX-4720 resistance are underlined in green***’. COT and CREB are 
independently validated mediators of resistance”*”®. 
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Extended Data Figure 8 | Individual validation of PLX-4720 resistance 
mediation by top screen hits. a, Validation of the top 10 Zeo screen hits 
and the top 10 shared hits (13 genes total). Every gene was independently 
activated by all three guides from the screen and tested for the ability to increase 
survival of A375 cells treated with three different concentrations of PLX-4720 
(21M, 0.5uM and 0.15,1M). The z-score based on the % increase in survival 
relative to control (A375 cells transduced with dCas9-VP64 and MS2-p65- 
HSF1 alone) is shown for each guide and PLX-4720 concentration. Five 
cDNAs available from a previous large-scale gain-of-function PLX-4720 
resistance screen were also included’’. Every guide for each top hit mediates 
significant PLX-4720 resistance. b, The same panel of top hits exhibits a large 
range of basal expression levels and is effectively activated by all guides. The 


1,000 1,500 2,000 0 


1,500 50 100 150 


fold upregulation 


500 


1,000 


fold upregulation 


expression level relative to the housekeeping gene GAPDH is shown both at 
baseline as well as after activation by each individual guide. c, Ranks of the 
validated set of genes in the previous ORF screen. Six genes were not part of 
the cDNA library, five hits are shared (present in the top 3%) and only LPAR5 
and ARHGEFI were present but not highly ranked. Both of these genes had 
highly ranked members of the same family. d, Levels of overexpression from 
the five tested cDNA constructs. Transcript levels were higher for these five 
cDNAs than those mediated by SAM for the same genes. e, Correlation of 
survival at 2 14M PLX-4720 treatment and transcript upregulation achieved 
by individual guides. For most genes (9 out of 12 shown), the percent survival is 
very similar across transcript levels achieved by all three guides. Dotted lines 
indicate control survival. 
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Extended Data Figure 9 | Expression of top hits and screen signatures are 
elevated in PLX-4720 resistant melanoma cell lines and patient samples. 
a, Heat map showing sensitivity to different drugs (top), expression of SAM 
top screen hits (middle), and SAM screen signature scores (bottom; see 
Methods for signature generation) in Cancer Cell Line Encyclopedia cell lines” 
Drug sensitivities are measured as Activity Areas (AA). The melanoma cell 
lines are sorted by PLX-4720 drug sensitivity. RAF inhibitors: PLX-4720 and 
RAF265; MEK inhibitors: AZD6244 and PD-0325901. b, Heat map showing 
expression of gene/signature markers for BRAF-inhibitor sensitivity (top), 
expression of SAM top screen hits (middle) and screen signature scores 


(bottom) in different BRAF(V600) patient melanoma samples (primary or 
metastatic) from The Cancer Genome Atlas. c, Heat map showing MITF 
expression (top), screen signature scores (middle), and expression of SAM top 
screen hits (bottom) in different BRAF(V600E) patient melanoma biopsies 
post-treatment with BRAF inhibitors*’. d, Bar chart showing the number of 
patients (out of 13 total) from c with at least a twofold change (post/pre- 
treatment) in gene expression of the top PLX-4720 screen hits in the post- 
treatment samples. All associations are measured using the information 
coefficient (IC) between the index and each of the features and P values are 
determined using a permutation test. All heat maps show Z scores. 
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Extended Data Figure 10 | Guide depletion analysis to identify gene set 
enrichment and guide efficiency parameters. a, b, Heat maps of ssRNA 
nucleotide content versus depletion after 21 days. sgRNA targeting significantly 
depleted genes (from RIGER analysis) in ssRNA-zeo (a) or sgsRNA-puro 

(b) screens were analysed for trends based on G or T content in the sgRNA 
sequence. sgRNA depletion is positively correlated with G content and 
negatively correlated with T content. Other bases analysed (A and C) had 
significant (P < 0.0007) but weak (r < 0.2) negative correlation. c, 90% of 
guides analysed fall within a 100-bp window < 200 bp from the TSS. Boxplots 
of distance from 5’ end of the guide to the TSS for sgRNA-zeo and sgRNA-puro 
in same and reverse direction (relative to target transcription). Whiskers span 
5th to 95th quartile. d, Coefficients and P values for ordinary least squares 
predicting sgRNA depletion of significantly depleted genes from G content, 
T content, distance from 5’ end of the guide to the TSS and direction of guide. 
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Only nucleotide content has a significant effect on depletion in this model, 
consistent with a high efficiency of guides within 200 bp of the TSS regardless of 
strand orientation (Fig. 2d). e, The cumulative frequency of sgRNAs 3 and 
21 days after transduction in A375 cells is shown. Shift in the 21-day curve 
represents the depletion in a subset of ssRNAs. Less than 0.1% of all guides are 
not detected at day 3 (detected by less than 10 reads). f, Depleted guides 
(Supplementary Table 3) can be analysed for significant clustering of gene 
categories. Gene categories exhibiting significant depletion based on Ingenuity 
Pathway Analysis (P < 0.01 after Benjamini-Hochberg FDR correction) are 
shown. Categories based on the 1,000 most depleted guides individually (left) 
and the average of all 3 guides/gene (right). These categories include either 
positive or negative regulators of each pathway that reduce proliferation 

and survival. 
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A spin-down clock for cool stars from observations 
of a 2.5-billion-year-old cluster 


Soren Meibom!, Sydney A. Barnes”*, Imants Platais*, Ronald L. Gilliland®, David W. Latham! & Robert D. Mathieu® 


The ages of the most common stars—low-mass (cool) stars like the 
Sun, and smaller—are difficult to derive’? because traditional dat- 
ing methods use stellar properties that either change little as the stars 
age** or are hard to measure* *. The rotation rates of all cool stars 
decrease substantially with time as the stars steadily lose their angu- 
lar momenta. If properly calibrated, rotation therefore can act as a 
reliable determinant of their ages based on the method of gyrochro- 
nology”*"’. To calibrate gyrochronology, the relationship between 
rotation period and age must be determined for cool stars of differ- 
ent masses, which is best accomplished with rotation period mea- 
surements for stars in clusters with well-known ages. Hitherto, such 
measurements have been possible only in clusters with ages of less 
than about one billion years’”"'’, and gyrochronology ages for older 
stars have been inferred from model predictions””*’"’. Here we 
report rotation period measurements for 30 cool stars in the 2.5- 
billion-year-old cluster NGC 6819. The periods reveal a well-defined 
relationship between rotation period and stellar mass at the cluster 
age, suggesting that ages with a precision of order 10 per cent can be 
derived for large numbers of cool Galactic field stars. 

Prior observations in star clusters with ages < 300 million years 
(Myr) have shown that cool stars begin their main-sequence phase with 
a dispersion in their rotation periods, P, spanning two orders of mag- 
nitude'* '*!8 (0.1-10 d). However, this dispersion diminishes rapidly with 
cluster age, t, as they lose angular momentum through magnetically 
channelled winds”, causing their periods to increase and converge to a 
well-defined relationship with stellar mass, M, by the age (600 Myr) of 
the Hyades cluster’*”’. These observations suggest that cool main- 
sequence stars older than the Hyades probably occupy a single surface, 
P= P(t, M), in P-t-M space, which can be defined by measurements 
of their periods and photometric colours (a proxy for stellar mass) in a 
series of age-ranked clusters (Fig. 1). Measuring stellar rotation peri- 
ods in clusters older than the Hyades will confirm or deny the exist- 
ence of such a surface, and, if it exists, define its shape and thickness. 

Models of cool-star rotational evolution also describe a convergence 
of rotation with age”'"'”*'. However, they differ in their predictions of the 
location and shape of the P-t-M surface and, beyond the common as- 
sumption that the rotation period of the Sun is typical for stars of its mass 
and age, were until 2011" unconstrained at ages greater than 600 Myr. 
Observations to define the P-t-M surface at older ages are therefore 
required to extend our knowledge of the P-t-M relations and thereby 
allow the ages of individual cool stars in the field to be derived from their 
measured periods and colours by the method of gyrochronology"’. 

The rotation period of a cool star can be determined from small 
(S 1%) periodic modulations in its brightness as rotation carries star- 
spots across the stellar disc. Older stars have fewer and smaller spots, 
making their periods harder to detect. Accordingly, observations from 
ground-based telescopes have been unable to detect rotation periods in 
clusters older than the Hyades. Although periods are increasingly being 


measured for isolated field stars”, their ages, unlike those of cluster stars, 
are not known to a precision adequate to calibrate gyrochronology. 
With an age of 2.5 billion years (Gyr) (ref. 23), NGC 6819 bridges 
the large gap in age between the Sun and existing cluster observations 
(Fig. 1). The cluster was within the field of view of NASA’s Kepler sat- 
ellite, permitting a time-series photometric survey of its members by 
The Kepler Cluster Study’®. Cool cluster members were selected for 
observation using prior ground-based photometry for the cluster”, a 
90 yr-baseline proper-motion study” and multi-epoch radial-velocity 
measurements over 15 yr (ref. 26 and an ongoing survey by S.M.) (Me- 
thods). The precision, cadence and duration of the Kepler photometry 
enable us to measure rotation periods for cool stars much older than the 
Hyades. Previous results from The Kepler Cluster Study in the 1 Gyr- 
old cluster NGC 6811 confirmed the existence ofa unique relationship 
between P and M for cool stars at that age’’, and measured a median 
rotation period of 10.8 d for solar-mass stars. Since then, a study to mea- 
sure periods from Kepler data in NGC 6819 was carried out by another 
group”’. That work was limited to cluster stars of greater than solar 
mass, and was thus unable to define a P-M relationship for cool stars. 
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Figure 1 | The schematic P-t-M surface for cool stars. The hypothetical 
relationship between rotation period, age and colour extrapolated (yellow) to 
greater ages from the colour-period relations in young clusters using a 
particular P-t relationship’, and assuming that the Sun (marked by the black 
solar symbol; ©) resides on it. The blue line indicates the locus of stars in 
NGC 6819 for which we have determined rotation periods. The dark grey lines 
at ages of 0.6 and 1 Gyr represent prior observations in the Hyades'* and 
NGC 6811"° clusters, respectively. Stellar masses in solar units are marked on 
the surface at the corresponding colours. (Figure adapted from ref. 16.) 
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We have measured rotation periods for 30 cool stars in NGC 6819 
(Extended Data Figs 1-5). All 30 stars are both photometric (from their 
location in the cluster’s colour-magnitude diagram) and kinematic (from 
measurements of their proper motions and radial velocities) members 
of the cluster (Methods). Their de-reddened colour indices, (B — V)o, 
range from 0.41 to 0.89 mag, corresponding to a stellar mass range from 
~1.4to 0.85 solar masses. Their periods range from 4.4 to 23.3 dand are 
displayed in the resulting colour—period diagram (CPD) for NGC 6819 
(Fig. 2). Extended Data Table 1 lists all relevant properties for the 30 
stars. The stars form a single and narrow sequence in the CPD. This se- 
quence defines a clear dependence of increasing stellar rotation period 
(decreasing rotation rate) on increasing stellar colour (decreasing mass) 
and represents the cross-section of the hypothesized P-t-M surface at 
t = 2.5 Gyr (Fig. 1, blue line). 

The solar-mass stars (defined here as those with 0.62 mag = (B — V) 
= 0.68 mag) all have periods between 17.36 and 18.70 d (mean, 18.2 d; 
s.d., 0.4 d), implying that the Sun’s rotation period was probably in that 
range when it was the age of NGC 6819. With the 10.8 d median period 
for solar-mass stars in NGC 6811", and the mean solar photometric 
rotation period of 26.1 d, this implies a Skumanich-type® spin-down (P 
varies as t'””) for solar-mass stars over the 3.6 Gyr interval measured. 

The relatively small number of periods detected for stars with (B — 
V)o from 0.47 to 0.57 mag does not reflect a lack of cluster members 
(Extended Data Fig. 6). A similar pattern was seen for NGC 6811"°, and 
is unsurprising to photometrists, who know that such stars show little 
variability. The colour index (B — V)o = 0.47 mag separates stars with 
radiative envelopes from those with convective envelopes and is assoc- 
iated with the onset of effective magnetic wind braking (the ‘break in 
the Kraft curve’*). The rotation periods of the more massive stars in 
NGC 6819 ((B — V)) < 0.47 mag) are scattered around a median of 
4.8 d, demonstrating a steeper spin-down (P varies as t) from a median 
period of 1.3 d in the 1 Gyr-old cluster NGC 6811"°. 

The 30 rotation periods were determined by Lomb-Scargle period- 
ogram analysis~’ of long-cadence (30 min exposures) Kepler light curves 
spanning ~3.75 yr (Methods). The rotation period for a given star was 
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Figure 2 | The colour-period diagram for NGC 6819. The distribution of 
rotation periods as a function of de-reddened colour index (B — V)9 for 30 cool 
photometric, proper-motion, and radial-velocity members of the 2.5 Gyr open 
star cluster NGC 6819. The measurements define a tight dependence of 
rotation period on colour (mass). The symbols and error bars respectively 
indicate the means and standard deviations of multiple measurements for 
the same star when available. The location of the Sun (4.56 Gyr) in the diagram 
is marked with a grey solar symbol. Stellar masses in solar units are given 
along the top horizontal axis at the corresponding colours. Solar-mass stars 
with (B — V)o between 0.62 and 0.68 mag (interval marked by grey line near 
the bottom horizontal axis) have a mean period of 18.2 d with a standard 
deviation of 0.4 d. 
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determined from subsections of the full light curve, chosen to avoid and 
minimize the effect on the measured period of multiple spots or spot 
groups, or that of trends not removed by the data processing, or both. 
For all periods reported we have manually examined the periodogram 
and the phased and unphased light curves, determined the periods 
independently using the CLEAN algorithm”, and assessed the level of 
contamination from neighbouring stars (Methods and Extended Data 
Fig. 7). 

We have also measured the projected rotation velocities (vsin(i), where 
vis the stellar rotation velocity and iis the inclination angle between the 
stellar spin axis and the observer’s line of sight) spectroscopically for 25 
of the 30 stars. These vsin(i) values are fully consistent with the pho- 
tometric rotation periods (Extended Data Fig. 8). The resolution of the 
spectra for the five remaining stars is too low to provide meaningful 
constraints on their rotation velocities. 

The measured rotation periods in NGC 6819 establish that the P-t- 
M surface is well defined at ages beyond 1 Gyr. Together with prior 
observations in younger clusters, they specify the location, shape and 
thickness of the surface to an age of 2.5 Gyr. 

The CPD is the projection of this surface onto the colour—period 
plane. Therefore, in this diagram, the measured rotational sequence for 
NGC 6819 separates stars younger (below the sequence) and older (above 
the sequence) than 2.5 Gyr, independent of any theoretical model of 
stellar rotational evolution. Because angular momentum loss in cool 
single stars is driven by internal processes independent of environment 
(pathological tidally interacting systems are an exception to this, but 
tidal effects on rotation are not a concern for ~97% of cool field stars; 
see Methods), it is the same for both cluster and field stars. Therefore, 
this classification, and, more generally, the relationship between rota- 
tion and age, must also be valid for cool field stars. 

The relative scatter (AP/P) about the NGC 6819 rotational sequence 
is ~10% for our entire sample, ~5% for stars with (B— V))> 
0.55 mag and ~2% (0.4 d) for the domain surrounding solar-mass stars 
that we are able to define particularly well. This scatter includes con- 
tributions from period measurement uncertainties, and the residual 
effects of both stellar differential rotation and the spread in initial pe- 
riods on the ‘zero-age main sequence’. Its relatively small size demon- 
strates that these effects do not prevent the determination of age from 
rotation, and that the P-t-M surface is intrinsically thin at this age, im- 
plying that ages determined from spin-down (‘gyro ages’) will be precise. 

The derivation of such ages requires a model. A number of such 
models exist””’""’, individually differing with respect to the functional 
forms of the underlying variables, and even with respect to what those 
specific variables are. The NGC 6819 rotation period data permit a com- 
parison between the predictions of these rotational evolution models for 
its age and the actual measurements. Figure 3 shows this comparison. 

The (unaltered) model of gyrochronology from ref. 11 provides a 
good fit to the data over nearly the full colour range of the observations. 
To test the precision of gyrochronology we may thus treat the coeval 
NGC 6819 stars as individual field stars, and ask what age the model of 
ref. 11 would provide for each of the 21 best-measured stars, that is, those 
with (B — V), colour index between 0.55 and 0.9 mag (masses between 
~1.1 and ~0.85 that of the Sun). Every one of these stars returns a gyro 
age between 2 and 3 Gyr, with a roughly Gaussian distribution centred 
at 2.49 Gyr (Extended Data Fig. 9). The standard deviation of the 21 
ages is 0.25 Gyr (10% of the mean gyro age), implying that ages of this 
precision can be derived for similarly well-measured field stars, despite 
the effects of measurement errors, differential rotation and a spread in 
initial rotation periods. 

The mean age of 2.49 Gyr also represents the gyro age for NGC 6819. 
The standard error in this cluster age is 0.056 Gyr (that is, a 2% uncer- 
tainty, ignoring possible systematic errors in gyrochronology). The clus- 
ter gyro age thus agrees to within the uncertainty with the classical stellar 
evolution age of the cluster”’, implying that gyrochronology is well- 
calibrated at 2.5 Gyr. 
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Figure 3 | Comparison between gyrochronology models and the NGC 6819 
CPD. The predictions from four different models*”''”” of cool star rotation 
periods at 2.5 Gyr are plotted against the measured periods in NGC 6819. 

All plotted models predict an observed increase in rotation period with 
increasing (B — V)o colour (decreasing stellar mass). The colour—period 
relation from ref. 11 fits the observations for stars with (B — V)) > 0.55 mag. 
The symbols and error bars respectively indicate the means and standard 
deviations of multiple measurements for the same star when available. The 
location of the Sun (4.56 Gyr) is marked with a grey solar symbol. 

Stellar masses in solar units are given along the top horizontal axis at the 
corresponding colours. The colour range for solar-mass stars is marked with a 
solid grey line near the bottom horizontal axis. The orange horizontal line 
for (B — V)y < 0.47 mag marks the median period of 4.8 d for stars in this 
colour range (~1.2-1.4 solar masses). Po in the model from ref. 11 refers to 
the initial (zero-age main sequence) rotation period. Ps, in the model from 
ref. 17 refers to the 50th percentile rotation period for a given stellar mass. 


We conclude that gyrochronology can provide accurate and precise 
ages for large numbers of cool stars with measured rotation periods. 
Such ages will enable us to study how astrophysical phenomena invol- 
ving cool stars evolve over time, and will therefore be important to a wide 
range of research from the Galactic scale down to the scale of individual 
stars and their companions. 
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METHODS 


1. Cluster membership for the 30 stars with measured rotation periods 

1.1. Radial-velocity membership. The common space motion of the stars in a 
cluster is an effective way to distinguish them from foreground or background 
stars in the Galactic disk. Using the Hydra and Hectochelle multi-object spectro- 
graphs on the WIYN 3.5m and MMT 6.5 m telescopes we have measured radial 
(line-of-sight) velocities over 15 yr for more than 4,300 stars within a circular 1°- 
diameter field centred on NGC 6819 (ref. 26 and an ongoing survey by S.M.). The 
Hydra spectra cover a 25 nm window centred on 513 nm and have a resolution of 
~20,000. The Hectochelle spectra have resolutions of ~40,000 over a 15 nm win- 
dow centred at 522 nm. For the late-type stars in NGC 6819 these spectral ranges are 
rich with narrow absorption lines and are thus well suited for radial-velocity mea- 
surements. Our radial-velocity measurement precision for stars of spectral types F, 
Gand K (masses from ~1.3 to 0.7 solar masses) is ~0.4km_! for stars brighter 
than 18.5 mag in the V band. 

Against the broad radial-velocity distribution of Galactic field stars in the direc- 
tion of the cluster, the members of NGC 6819 populate a distinct peak with a mean 
radial velocity of +2.6 + 0.8km7'. The uncertainty represents the velocity disper- 
sion among the stars caused by internal cluster dynamics, binary orbital motions 
and observational errors. For a given star, the probability of cluster membership 
(Pry) is calculated from simultaneous fits of separate Gaussian functions to the clus- 
ter (FC) and field (FF) radial-velocity distributions. The probability is defined as the 
ratio of the cluster-fitted value over the sum of the cluster- and field-fitted values at 
the star’s radial velocity*® (RV): 


Pry = FC(RV)/[FC(RV) + FF(RV)] 


The 30 stars in NGC 6819 with measured rotation periods are all radial-velocity 
members of the cluster (Pay > 50%) and their membership probabilities are given 
in Extended Data Table 1. The radial-velocity measurements also suggest that none 
of the 30 stars are in short-period binary stars, and, thus, that their angular momen- 
tum evolution is not affected by tidal interactions**”. 

1.2. Proper-motion membership. For NGC 6819, a number of archival photo- 
graphic plates are available, taken between 1919 and 1973 with long-focus telescopes. 
A total of 23 photographic plates were digitized using the Space Telescope Science 
Institute’s GAMMA II multi-channel microdensitometer*. Combining the mea- 
surements of these plates with matching second-epoch high-spatial-resolution CCD 
images, obtained in 2009 with the MegaCam camera on the 3.6 m Canada-France 
Hawaii Telescope (CFHT), allows us to derive accurate proper motions for stars in 
the field of NGC 6819. Proper motions were calculated for 15,750 stars down to 
22 mag in V over a 40 arcmin X 40 arcmin field centred on NGC 6819. The accu- 
racy of the proper motions for well-measured stars is ~0.2 mas yr_'. For the inner 
parts (within a 15 arcmin radius from the centre) of the cluster this accuracy holds 
for stars brighter than 18 mag in V. Considering that the intrinsic dispersion of field 
star proper motions in the direction of NGC 6819 is about 3 mas yr, this high 
accuracy enables a clean separation of cluster stars from field stars. Cluster mem- 
bership probabilities (P,,) were calculated using 


Py = Peuuster! (Pctuster + Prica) 


where Pchuster and Dig are the two-dimensional Gaussian frequency distribu- 
tions of the cluster and field stars, respectively”’. 

The majority (21) of the 30 stars with measured rotation periods have P,, > 90%. 
For the remaining stars P,, is lower because they are either located in the periphery 
of the cluster or their proper-motion errors are higher than expected at their appar- 
ent magnitudes”. The latter usually is due to some degree of stellar image overlap. 

1.3 Photometric membership. The colour-magnitude diagram (CMD; Extended 
Data Fig. 6) provides a third set of criteria for cluster membership. In the CMD, 
cluster members trace a well-defined relationship between stellar mass (B— V 
colour index) and luminosity (brightness, V). Extended Data Fig. 6 shows V and 
(B — V)o for proper-motion members of NGC 6819. The cluster members form a 
clearly visible diagonal band in the CMD and a ‘hockey-stick’-like turn-off near 
(B— V)o ~ 0.45 mag and V ~ 16 mag. The locations of the 30 stars with measured 
rotation periods are marked with larger red circles. They are located on the cluster 
band in the CMD, making them photometric members of NGC 6819. The two 
members near (B — V)o = 0.49 mag are photometric binaries. The combined light 
from the two stars in the binary system places them above the cluster sequence. For 
these two stars, three and four radial-velocity measurements over 1,732 d and 1,074 d, 
respectively, show radial-velocity variations near our measurement precision, sug- 
gesting that these stars are members of relatively wide binary systems. 

2. Data and data analysis. Stars identified as members of NGC 6819 were added 
to the list of targets for the Kepler mission as part of The Kepler Cluster Study’®. 
The 30 stars for which we measure rotation periods were observed by Kepler for 


2.5 yr, on average, over ~3.75 yr. NGC 6819 was located in the part of the Kepler 
field of view covered by the CCD module that failed in January 2010. Since then, 
targets in NGC 6819 could be observed for only three of the four quarters each 
year. 

Stellar rotation periods were derived from Kepler data summed into long-cadence 
(~30 min) bins. The data were processed by version 8.0 of the Kepler mission’s 
data analysis pipeline and corrected by the Kepler Presearch Data Conditioning 
(PDC) module of the pipeline with an additional Bayesian maximum a posteriori 
(MAP) approach for the removal of systematics while preserving astrophysical 
signals such as rotational modulation**”’. Before performing our period search, all 
quarters of the corrected data were normalized by the median signal and joined 
together to form a single light curve for each star. We used Lomb-Scargle period- 
ogram analysis” to detect periodic variability, searching 20,000 frequencies cor- 
responding to periods between 0.05 and 100 d. The rotation period for a given star 
was determined from between one and seven separate time intervals distributed 
over the full light curve. The median peak-to-peak amplitude of variability for all 
period detections is 4 mmag, with a range of 1 to 125 mmag. When more than one 
rotation period measurement was possible for a given star, the mean period was 
calculated and used. For all stars, the periodogram and the raw and phased light 
curves were examined by eye. The periods were derived independently by both 
S.M. and S.A.B. using different analysis tools and algorithms. Extended Data Figs 
1-5 shows examples, for all 30 stars, of PDC-MAP corrected light curve intervals 
used to measure their rotation periods. The figure also shows the light curves 
phase-folded on the periods, and the corresponding periodogram for each star. 
Extended Data Table 1 lists, for each of the 30 stars, basic astrometric and pho- 
tometric data, the radial-velocity and proper-motion cluster membership prob- 
abilities, the number of period measurements, and their period mean and standard 
deviation. 

For 25 of the 30 stars, spectra acquired with Hectochelle enabled determination 
of their projected rotation velocities (vsin(i)) via cross-correlation with a library of 
synthetic spectra. Extended Data Fig. 8 shows the mean rotation periods versus the 
mean vsin(i) for stars in NGC 6819. It also displays three curves tracing the expected 
relation between rotation period and rotation velocity for a 90° inclination angle 
(i) of the stellar rotational axis and stellar radii of 0.85, 1.0, 1.4 solar radii. The error 
bars represent the standard deviation of multiple rotation period and vsin(i) mea- 
surements. The figure demonstrates that the photometrically measured rotation 
periods are consistent with the spectroscopically derived rotation velocities. 

3. The distribution of gyrochronology ages for the 30 NGC 6819 members. We 
have calculated the gyro ages for 21 stars with (B — V)) between 0.55 and 0.9 mag 
(masses between ~1.1 and 0.85 solar masses) using the model of ref. 11. Although 
this model is valid for stars with (B — V)) > 0.47 mag, the NGC 6819 rotation se- 
quence is poorly defined (by only two stars) for 0.47 mag < (B — V)y < 0.55 mag. 

For each of the 21 stars we have converted its (B — V)p colour into a value of the 
(global) convective turnover timescale, t, using a numerical one-to-one transfor- 
mation table”, and linear interpolation as needed. The resulting t values are asso- 
ciated with the measured rotation period values, P, and inserted into equation (32) 
in ref. 11, that is 


t = (t/kc)In(P/Pp) + (ky/2t)(P” — Po’) 


using an initial period of Py = 1.1 d, as suggested in ref. 11. The dimensionless con- 
stants kc = 0.646 Myrd~' and ky = 452d Myr ' listed there have been retained 
unmodified. This expression provides the age, t (in Myr), explicitly in terms of the 
independent variables, P and t. 

All of the resulting individual ages lie between 2 and 3 Gyr, as can be seen in the 

histogram in Extended Data Fig. 9, which is peaked between these values. The 
formal mean age is 2.49 Gyr (1o = 0.25 Gyr), and the median age is 2.43 Gyr. The 
gyrochronology age for NGC 6819 is therefore 2.49 Gyr with a standard error of 
0.056 Gyr (2%, ignoring possible systematic errors in gyrochronology). 
4. Is contaminating light from close neighbours a problem? NGC 6819 con- 
tains about 2,500 stars”, is located 2.4 kiloparsecs (~7,800 light years) from the 
Sun**?”~? and is only 13.5° above the Galactic plane. The cluster field is therefore 
densely populated with both cluster and foreground field stars. Accordingly, we must 
consider whether light from nearby stars can have ‘leaked’ into the photometric 
apertures used for our cluster targets. It would be difficult, ifnot impossible, to elim- 
inate such contamination for all cluster stars with rotation period measurements. 
Instead, we take a qualitative approach to building a strong case against our overall 
result being unduly influenced by contaminating neighbours. 

To accomplish this, we took advantage of our extensive stellar catalogue for 
the NGC 6819 field based on the deep, high-spatial-resolution CFHT/MegaCam 
images”. For each of 43 cluster stars for which we initially detected periodic photo- 
metric variability, we listed the astrometric and photometric properties for all neigh- 
bouring stars within a 20 arcsec radius. We also used the Kepler Guest Observer 


©2015 Macmillan Publishers Limited. All rights reserved 


tool ‘kepfield’ to extract the pixel mask images (PMI) for each target star and for 
each quarter of observations. The ‘kepfield’ tool provides the coordinates, Kepler 
magnitudes and Kepler IDs for all stars in the Kepler Input Catalog (KIC) within 
the PMI. We searched for periodic variability in the light curves for all such neigh- 
bours observed by Kepler. Collectively, this information allowed us to study the 
angular separation, relative brightness and variability of neighbours within a ~5 pixel 
radius from each of the variable cluster stars. For the brightest neighbours, the 
photometric colour indices, effective temperatures and surface gravities derived from 
our star catalogue or given in the KIC, or both, provided additional guidance re- 
garding their spectral type and evolutionary state. 

From the 43 stars of interest, we rejected 13. Some of these stars were removed 
because the detected photometric variability was equal in period and in phase to 
variability detected in a close neighbour. Others were removed because of correla- 
tions between the presence and amplitude of periodic variability and the shape and 
size of the photometric aperture. Any such correlation suggests that the source of the 
variable signal originates outside the aperture and that a change in the aperture’s 
size, shape and location results in more or less of the contaminating light being 
included. For some of the discarded stars the shape and size of the photometric 
aperture used in a given quarter was clearly affected by the signal from neighbour- 
ing stars. For others their location was significantly offset from the pixel with max- 
imum counts inside the aperture, or, in the most severe cases, was entirely outside 
the aperture in one or more quarters. 

Extended Data Fig. 7 shows the PMIs for three quarters (Q) for one accepted star 
and one quarter each for two of the rejected stars. Extended Data Fig. 7a displays 
the PMI for Q15, Q16 and Q17 for accepted star KIC 4938993 (green circle within 
optimal aperture). This ~1.2 solar-mass star (V = 15.7 mag, (B — V)9 = 0.5 mag) 
has 13 neighbours within 20 arcsec. The brightest neighbour is 1 mag brighter than 
KIC 4938993 and is located 19.6 arcsec away and outside the PMI. All other neigh- 
bours are 4 arcsec or more distant and 2.8 mag or more fainter. None of these neigh- 
bours has a light curve available from the Kepler archive. KIC 4938993 dictates the 
shape, location and size of the optimal aperture and falls on the optimal-aperture 
pixel with maximum counts in all quarters. It is thus highly unlikely that the 11.89 d, 
~30 mmag amplitude signal observed for this star originates in a neighbouring star. 
In Extended Data Fig. 7b the PMI for quarter 8 for rejected star KIC 5023712 is 
displayed. The KIC star near the left edge of the PMI is 3.2 mag brighter than 
KIC 5023712, and was not observed by Kepler. The star at (column, row) = (592.1, 
853.8) is 0.43 mag fainter and it is closer to the optimal aperture of KIC 5023712 
than the 2 pixel radius of the circular aperture that captures 95% of its signal. This 
star varies with the same period and three times the amplitude of KIC 5023712, sug- 
gesting that the periodic signal observed for KIC 5023712 originates in this fainter 
neighbour. Extended Data Fig. 7c shows the PMI for quarter 5 for rejected star 
KIC 5287900 and how a nearby star brighter by 4.7 mag causes the optimal aper- 
ture to shift off the target entirely. 

Following this analysis, we were left with 30 stars for which we believe the pe- 
riodic variability in their light curves reflects rotational modulation. Our confidence 
in the 30 rotation periods is bolstered by the comparison between their expected 
projected rotation velocities and their spectroscopically measured vsin(i) values 
(Extended Data Fig. 8). Furthermore, the narrow rotational sequence traced by the 
30 stars in the NGC 6819 CPD (Fig. 2) is by itself strong evidence against signifi- 
cant contamination. Neighbouring stars are likely to have masses (colours) or ages, 
or both, different from those of our target stars, and significant contamination would 
broaden the observed narrow sequence. 

5. Are tidal interactions with close companions a concern for gyrochronology? 
The gyrochronology age of a cool star is derived under the assumption that its ro- 
tational evolution has not been influenced by external forces. Although this is the 
case for the vast majority of cool Galactic field stars, a small fraction of stars will 
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have a stellar or planetary companion close enough that tidal forces can drive an 
exchange of spin and orbital angular momentum between the two objects**"'. We 
provide here an estimate of the fraction of cool Galactic field stars for which tidal 
interactions can potentially affect their rotation. 

The tidal torque scales as the inverse of the binary semi-major axis to the sixth 
power, restricting significant tidal evolution for unevolved cool stars to the very 
closest systems. For stellar binaries with solar-type components, tidal synchroniza- 
tion will have a significant effect on the stellar rotation only in systems with orbital 
periods less than ~20 d (semi-major axis less than ~0.18 Au; ref. 32). Using the 
distribution of orbital periods for field binaries with solar-type components”, we 
find that binaries with periods of less than 20 d correspond to ~5% of that pop- 
ulation. The same study” estimates that less than half (46%) of all solar-type stars 
are in binaries, implying that only about ~2.5% of cool Galactic field stars have a 
stellar companion close enough for effective tidal interactions. 

Short-period planetary companions can also tidally interact with their hosts, 
causing an increase in the stellar rotation period. The relevant class of planets here 
is the hot Jupiters, that is, planets with at least 10% of Jupiter’s mass and orbits of less 
than 10 d (ref. 43). Such planets occur around solar-type field stars with a frequency 
of ~1% (ref. 43). Although the efficiency of tidal evolution in such star—planet sys- 
tems is still uncertain on theoretical and observational grounds, it is probably only 
the hottest and most massive of the hot Jupiters that will have a significant effect 
on the rotational evolution of their host stars. 

Assuming for simplicity that hot Jupiters do not occur in binaries with orbital 
periods shorter than 20 d, we estimate that at most 3% of cool stars have stellar or 
planetary companions that can affect their rotation through tidal interactions. Thus, 
tidal effects on rotation are not a concern for ~97% of cool Galactic field stars. 
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Extended Data Figure 1 | The light curves, phase-folded light curves and phase-folded light curve (b), and the periodogram (power as a function of 
periodograms for the stars KIC 5111207, 5023899, 5024227, 5023760, rotation frequency, c). The KIC identification number“ and the measured 


5024122 and 5113601. For each star we show a segment of the full Kepler rotation period for each star are shown above the light curve segments. 
light curve used for determining its rotation period (a), the corresponding 
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Extended Data Figure 2 | The light curves, phase-folded light curves and periodograms for the stars KIC 5112499, 5026583, 4938993, 5111834, 5111908 and 
5024856. See Extended Data Fig. 1 for details. 
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Extended Data Figure 3 | The light curves, phase-folded light curves, and periodograms for the stars KIC 5112507, 5024280, 5023796, 5024008, 5023724 
and 5023875. See Extended Data Fig. 1 for details. 
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Extended Data Figure 4 | The light curves, phase-folded light curves and periodograms for the stars KIC 5112268, 5111939, 5025271, 4937169, 5112871 and 
5023666. See Extended Data Fig. 1 for details. 
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Extended Data Figure 5 | The light curves, phase-folded light curves and periodograms for the stars KIC 5024182, 5023926, 4937149, 4936891, 4937119 and 
4937356. See Extended Data Fig. 1 for details. 
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Extended Data Figure 6 | The NGC 6819 colour-magnitude diagram. The _ rotation periods are marked with larger red circles. They all fall along this 


colour-magnitude diagram for stars identified as common proper-motion band and are thus photometric members of NGC 6819. Stellar masses in solar 
members of NGC 6819” and located within 5 arcmin of the cluster centre. units are given along the top horizontal axis at the corresponding colours. 
The diagonal band tracing a tight relationship between the de-reddened The light from distant binary companions causes the two rotators near 
photometric colour index, (B — V)o, and brightness, V, represents the (B— V)o = 0.5 mag to fall above the cluster sequence. 


population of cluster members. The locations of the 30 stars with measured 
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Extended Data Figure 7 | Pixel mask images for NGC 6819 members. Kepler field. The solid blue line traces the optimal aperture (optimizing the 
Examples of pixel mask images (PMI) for the accepted star KIC 4938993 signal-to-noise ratio for the target) defined for each target star in each quarter. 
(a) and the rejected stars KIC 5023712 (b) and KIC 5287900 (c). Semi- The shape, size and location of the optimal aperture typically differed for 
transparent green circles mark the positions of KIC sources. Red dots the different quarters of observations. 


correspond to the positions of fainter sources from deeper surveys within the 
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Extended Data Figure 8 | A comparison of rotation periods and projected —_1.4 solar radii, observed at an inclination angle of their spin axes (i) of 90°. 
rotation velocities for stars in NGC 6819. Projected rotation velocities All stars plotted have single-lined spectra. The average rotation velocity 
(vsin(i)) plotted against the measured rotation periods for stars in NGC 6819. __ resolution in the Hectochelle spectra is 7.38 km s '. The agreement between 
For comparison, three solid black curves show the expected relations between _ the expected and observed vsin(i) values for the measured rotation periods 
rotation period and vsin(i) for stars with respective radii of 0.85, 1.0 and provides additional validation of our rotation period measurements. 
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Extended Data Figure 9 | The gyro age distribution for 21 cool dwarf distribution are 2.49 and 2.43 Gyr, respectively. The standard deviation for the 
members of NGC 6819. The gyrochronology ages for the 21 stars in the 21 gyro ages is 0.25 Gyr, or 10% of the mean gyro age for the cluster. The 
NGC 6819 CPD with (B — V)o colours in the range from 0.55 to 0.9 mag standard error of the 2.49 Gyr mean is 0.056 Gyr, or ~2%. 


(masses between ~1.1 and 0.85 solar masses). The mean and median of the 
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Extended Data Table 1 | Basic parameters and rotation period measurements for 30 members of NGC 6819 


Kepler ID 


5111207 
5023899 
5023760 
5024227 
5024122 
5112499 
5113601 
5026583 
4938993 
5111834 
5111908 
5024856 
5024280 
5112507 
5023796 
5024008 
5023724 
5023875 
5112268 
4937169 
5025271 
5111939 
5112871 
5023666 
5024182 
5023926 
4937149 
4936891 
4937119 
4937356 


Right 
Ascension 


(hr min sec ) 


19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 


40 
40 
40 
41 
41 
41 
42 
42 
42 
40 
40 
41 
41 
41 
40 
41 
40 
40 
41 
41 
41 
40 
41 
40 
41 
40 
41 
41 
41 
41 


09.01 
55.90 
49.33 
09.49 
05.50 
13.91 
00.18 
55.25 
55.85 
43.86 
43.86 
30.61 
11.63 
14.09 
51.10 
00.51 
47.50 
54.82 
04.25 
22.61 
47.15 
49.63 
26.64 
44.23 
07.57 
56.89 
21.54 
07.75 
19.90 
33.09 


Declination 


40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 


Cc) 


12 
10 
09 
11 
08 
15 
12 
09 
00 
15 
12 
08 
08 
17 
09 
10 
07 
09 
14 
05 
09 
12 
16 
10 
08 
06 
05 
05 
05 
05 


16.7 
04.7 
07.0 
41.6 
28.4 
30.3 
19.2 
48.0 
19.8 
18.4 
53.2 
42.8 
34.1 
34.7 
45.1 
29.4 
40.3 
20.5 
23.2 
44.6 
30.3 
05.2 
01.4 
11.6 
50.3 
10.1 
21.1 
55.0 
39.7 
12.4 


V 


(magn) 


15.27 
15.24 
15.30 
15.34 
15.91 
15.94 
15.79 
15.77 
15.74 
16.87 
17.00 
17.26 
17.22 
17.16 
17.23 
17.37 
17.42 
17.44 
17.56 
17.62 
17.57 
17.70 
17.77 
17.84 
17.83 
17.89 
18.07 
18.41 
18.43 
18.46 


(B-V)o 


(magn) 


0.41 
0.42 
0.43 
0.43 
0.45 
0.46 
0.46 
0.49 
0.50 
0.57 
0.58 
0.62 
0.63 
0.63 
0.64 
0.65 
0.66 
0.67 
0.68 
0.70 
0.70 
0.70 
0.71 
0.73 
0.75 
0.77 
0.80 
0.85 
0.87 
0.89 


nP 


ONMNN DF KRHA WNHHANANWWSNH WHR WHKFANN WOH TWN WO DD OD N 


Pmean 


(Days) 


5.42 
4.81 
4.78 
5.06 
6.36 
4.44 
7.01 
4.90 
11.87 
13.89 
17.41 
18.19 
17.36 
18.19 
18.30 
18.40 
18.00 
18.33 
18.70 
19.62 
21.29 
21.75 
21.25 
21.54 
21.29 
20.82 
21.68 
21.98 
23.28 
21.23 


Op 


(Days) 


1.05 
0.44 
0.10 
0.39 
1.04 
0.92 
0.65 
0.98 
1.46 
0.64 


0.62 
0.19 
1.19 
0.91 
0.94 
0.25 


0.10 
2.85 
1.83 
0.61 
0.63 
0.81 
0.77 
1.73 


93 


LETTER 


(%) 


78 


The Kepler ID is from the KIC“*. Right ascension and declination are equinox J2000*°. The stellar brightness, V, and colour index, (B — V)o, are based on CCD photometry using the CFH12K mosaic CCD on the 3.6m 
CFHT**. The B — V colours were de-reddened using a value for the colour excess E,g_y of 0.15 (refs 38, 39, 45). For stars with more than one rotation period measurement (nP > 1), the Pyean and ¢p columns list the 


mean period and the standard deviation of the multiple measurements. 
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Michelson- Morley analogue for electrons using 
trapped ions to test Lorentz symmetry 


T. Pruttivarasin', M. Ramm!, S. G. Porsev**, I. I. Tupitsyn°, M. S. Safronova®®, M. A. Hohensee!” & H. Hiiffner! 


All evidence so far suggests that the absolute spatial orientation of an 
experiment never affects its outcome. This is reflected in the stand- 
ard model of particle physics by requiring all particles and fields to 
be invariant under Lorentz transformations. The best-known tests 
of this important cornerstone of physics are Michelson-Morley-type 
experiments verifying the isotropy of the speed of light’~*. For mat- 
ter, Hughes-Drever-type experiments*"! test whether the kinetic 
energy of particles is independent of the direction of their velocity, 
that is, whether their dispersion relations are isotropic. To provide 
more guidance for physics beyond the standard model, refined ex- 
perimental verifications of Lorentz symmetry are desirable. Here we 
search for violation of Lorentz symmetry for electrons by perform- 
ing an electronic analogue of a Michelson—Morley experiment. We 
split an electron wave packet bound inside a calcium ion into two parts 
with different orientations and recombine them after a time evo- 
lution of 95 milliseconds. As the Earth rotates, the absolute spatial 
orientation of the two parts of the wave packet changes, and aniso- 
tropies in the electron dispersion will modify the phase of the inter- 
ference signal. To remove noise, we prepare a pair of calcium ions 
in a superposition of two decoherence-free states, thereby rejecting 
magnetic field fluctuations common to both ions”. After a 23-hour 
measurement, we find a limit of A X 11 millihertz (his Planck’s con- 
stant) on the energy variations, verifying the isotropy of the electron’s 
dispersion relation at the level of one part in 10'°, a 100-fold improve- 
ment on previous work’. Alternatively, we can interpret our result as 
testing the rotational invariance of the Coulomb potential. Assum- 
ing that Lorentz symmetry holds for electrons and that the photon 
dispersion relation governs the Coulomb force, we obtain a fivefold- 
improved limit on anisotropies in the speed of light”*. Our result 
probes Lorentz symmetry violation at levels comparable to the ratio 
between the electroweak and Planck energy scales'*. Our experiment 
demonstrates the potential of quantum information techniques in 
the search for physics beyond the standard model. 

Invariance under Lorentz transformations is a key feature of the stand- 
ard model, and as such is fundamental to nearly every aspect of modern 
physics. Nevertheless, this symmetry may be measurably violated, for 
example, as a result of spontaneous symmetry breaking in quantum fields 
with dynamics at experimentally inaccessible energy scales not expli- 
citly treated by the standard model’*. Some theories that unify gravita- 
tion and the standard model assert that Lorentz symmetry is valid only 
at large length scales’*'*. A natural estimate of the fractional shift of 
electron dispersion relations due to Lorentz violation at the Planck scale 
is given by the ratio between the electroweak and Planck energy scales, 
that is, ~2 X 10 ’” (ref. 13). Other models suggest that large Lorentz 
violation at the Planck scale is suppressed by supersymmetry”. In such 
scenarios, the constraints on Lorentz violation for neutrons® can be used 
to set an upper bound of order 100 TeV on the supersymmetric energy 
scale'*. Therefore, precision tests of Lorentz symmetry complement 


direct probes of high-energy physics being carried out at the Large 
Hadron Collider. 

We analyse Lorentz violation in the context of a phenomenological 
framework known as the standard model extension’®”° (SME). The SME 
is an effective field theory that augments the standard model Lagrangian 
with every possible combination of the standard model fields that is not 
term-by-term Lorentz invariant, while maintaining gauge invariance, 
energy-momentum conservation, and Lorentz invariance of the total 
action'*”°. The SME can be used to describe the low-energy limit of many 
different theories which predict Lorentz violation, and includes the 
standard model as a limiting case. The SME thus provides a compre- 
hensive framework for quantifying a wide range of Lorentz-violating 
effects, and isa flexible tool for consistently evaluating a wide variety of 
experiments”'. 

The SME allows for Lorentz violation for all particles separately. How- 
ever, to verify a particle’s Lorentz symmetry, it must be compared with 
a reference system because only differences in their behaviours under 
Lorentz transformation are observable”. For instance, typical interpre- 
tations of Michelson—Morley experiments testing Lorentz violation of 
photons assume that the lengths of the interferometer arms are invari- 
ant under rotations. Because the lengths of interatomic bonds depend 
on the electron dispersion relation”, those interpretations can be said 
to assume that Lorentz symmetry for electrons (and nuclei making up 
the interferometer arms) holds unless a second distinct reference sys- 
tem is used”’. For our experiment, it seems more natural to use light as a 
reference and assume that photons obey Lorentz symmetry. However, 
it is important to keep in mind that an experimental signature of the 
Lorentz violation considered here can equally be attributed to Lorentz 
violation of electrons as well as to that of photons, which would man- 
ifest itself as an asymmetry of the photon-mediated Coulomb poten- 
tial (Methods). Thus, we take the most general view, namely that we 
measure the difference between the electron and photon anisotropies. 

We take this view by choosing a coordinate system in which a hy- 
pothetical Lorentz violation in light manifests itself in the electronic 
Lagrangian (Methods). We obtain the modified electronic quantum elec- 
trodynamics Lagrangian 


L= ; iy (», ~ dy?) Dy—ymey (1) 


where m, is the electron mass, if is a Dirac spinor, y“ are the Dirac ma- 


trices and WD = WD" — WD" W with D’ being the covariant derivative. 


The effect due to Lorentz violation is described by the tensor Coy -_ 


Cuy + Kyy jf 2, which contains Lorentz-violation parameters from both 


the electron (c,,,) and the photon (k,,,) sectors’””°. Because - is frame 


dependent, we uniquely specify its value in the Sun-centred, celestial- 
equatorial frame (SCCEF), that is, the Sun’s rest frame. Time-dependent 


Lorentz transformations due to the Earth’s motion transform c’,, in 


the SCCEF to the time-dependent values in the local laboratory frame 
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University of Delaware, Newark, Delaware 19716, USA. *Petersburg Nuclear Physics Institute, Gatchina, Leningrad District 188300, Russia. ’Department of Physics, St Petersburg State University, 
Ulianovskaya 1, Petrodvorets, St Petersburg 198504, Russia. “Joint Quantum Institute, National Institute of Standards and Technology and the University of Maryland, College Park, Maryland 20742, USA. 
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on the Earth. Hence, the contribution of Coy to any laboratory-frame 


observable will vary in time. 

For us, the important consequence of electronic Lorentz violation is 
the dependence of an electron’s energy on the direction of its momen- 
tum. For an atomically bound electron with momentum p, the La- 
grangian in equation (1) results in a small energy shift that depends on 
the direction of the electron’s momentum and is described by the ef- 
fective Hamiltonian™* 

(p°—3p:) 


(2) 
=C 7) Ss 2 
pe (2) 


in the laboratory frame and p, is 


sH= 
where cl?) contains elements in Cos 
the component of the electron’s momentum along the quantization axis, 
which is fixed in the laboratory. The energy shift depends on how the 
total momentum p is distributed among the three spatial components. 


As the Earth rotates, ce) varies in time, resulting in a time variation of 
the electron’s energy correlated with the Earth’s motion. 

To probe Lorentz violation, we perform the electronic analogue of a 
Michelson—Morley experiment by interfering atomic states with aniso- 
tropic electron momentum distributions aligned along different direc- 
tions, such as available in the *Ds,. manifold of *°Ca*. We trap a pair of 
“°Ca* ions with an ion-ion separation of ~16 um ina linear Paul trap, 
and define the quantization axis by applying a static magnetic field of 
3.930 G vertically. The direction of this magnetic field changes with 
respect to the Sun as the Earth rotates, resulting in a rotation of our in- 
terferometer (Fig. 1). 

We calculate the Lorentz-violation-induced hypothetical energy shift 
of “°Ca* in the 7D; manifold according to equation (2) (expressed 
here in hertz): 

“ = [(2.16 x 10") — (7.42 x 10!) m?] c?) 
where mis the magnetic quantum number (Methods). To obtain max- 
imum sensitivity to Lorentz violation, we monitor the energy difference 
between the states |+5/2)=|?Ds/2;my=+5/2) and |4+1/2)= 
"Ds /2; my = +1/2) using a Ramsey-type interferometric scheme. To 
reject magnetic field noise, which is the main source of decoherence, 
we create a product state | ¥) = (1/2)(|—1/2) +|—5/2))@(|+1/2) 
+|+5/2)) by applying to both ions a series of m/2- and -pulses on 


Sun 


-1/2 +1/2 AB 
| -6/2 +5/2 
=| ) +|ex ex) 
af Earth’s orbit 


Earth 


Figure 1 | Rotation of the quantization axis of the experiment with respect 
to the Sun as the Earth rotates. We apply a magnetic field (B) of 3.930 G 
vertically in the laboratory frame to define the quantization axis of the 
experiment. As the Earth rotates with an angular frequency given by 

M@ = 21/(23.93 h), the orientation of the quantization axis and, consequently, 
that of the electron wave packet (as shown in the inset in terms of probability 
envelopes) changes with respect to the Sun’s rest frame (positions at 

various times UTC are illustrated). The angle y ~ 52.1° is the colatitude of 

the experiment. 
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the S,.-Ds,2 transition. Under common noise induced by a fluctuat- 
ing magnetic field, the product state rapidly dephases to a mixed state 


that contains a decoherence-free entangled state | aa = (1 / v2) 
(|—5/2,+5/2) +|—1/2,+1/2)) with 50% probability”’. This entan- 
gled state evolves freely in time according to 
1 

V2 
where AEg is the energy difference between the states |—5/2, +5/2) and 
|—1/2, +1/2), bp is a phase offset and # is Planck’s constant divided by 
2m. The remaining components of the mixed state, which are the states 
|—5/2, +1/2) and |—1/2, +5/2), each with 25% probability, are time 
independent. 

In Fig. 2, we illustrate the dynamics of the state | ¥*). By expressing 
the state in the even-odd parity basis, | +) = (1/2) (|—5/2,+5/2) 
+|—1/2,+1/2)), the time-dependent state can be written as 


[¥*()) =e (1+ eet te) 4) + (1 —etaseite)) |) (3) 


We interpret the trajectory of | ¥"(#)) to be along the equator of the 
Bloch sphere as shown in Fig. 2b. The state | WY(t)) oscillates back and 
forth between the states |+) and |—) with frequency fg = AER/h. To 
read out the ion state in the |) basis, we apply to both ions a series of 
m- and 1/2-pulses on the S}/2-Ds,2 transition, followed by an electron- 
shelving readout scheme’. The difference between the probabilities 


[P*() = (1-5/2, + 5/2) + else +40)| —1/2,+1/2)) 


a 
m,= -5/2 -3/2 -1/2 1/2 3/2 5/2 |-5/2, +5/2) 
oe — 
Bog pnp a en — 
= + 
Oe, ee ee eee oe 
a 
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Figure 2 | Oscillation of the decoherence-free state. a, A combination 

of different magnetic sublevels of the first (circles) and second (triangles) 
*°Ca* ions in the *D;,. manifold forms a decoherence-free state 

| yr = (1/v2) (|—5/2,+5/2) +|—1/2,+1/2)). Blue and red colours 
indicate pairing of the single-ion states in each component of | wry, b, Time 
evolution of the state | y(t)) represented by a trajectory on a Bloch sphere 
with poles given by |—5/2, +5/2) and |—1/2, +1/2). (We neglected 
contributions from the states | —5/2, +1/2) and | —1/2, +5/2), which have no 
phase coherence.) The state | si R@)) oscillates back and forth between the 
even-odd parity basis states, | +), as given in equation (3). ¢, Oscillation of a 
product state containing an entangled state | ye with 50% probability. 

Each data point is derived from 200 repetitions of the Ramsey-type 
experimental cycle shown in Fig. 3a. The error bars (no larger than data 
symbols, and omitted to simplify figure) are obtained from requiring that the 
fit to the Ramsey fringe function (grey solid line) gives 1/77. jceq = 1 and 
assuming that the data are normally distributed. The fit yields an oscillation 
frequency of 164.9 + 0.1 Hz and a decay constant of 155 + 17 ms, which is 
substantially smaller than the value expected from the lifetime of the Ds 
state of “°Ca*. We attribute the loss of coherence to the heating rate of the ion 
trap, of ~0.2 quanta ms _', which degrades the quality of the analysis pulses 
for long Ramsey interrogation times. To save measurement time, data was not 
taken in the ~60-80 ms interval. 
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P. and P_ of the ions being in the states |+) and |—), respectively, 
yields an oscillating signal given by P= P4 —P_ =cos(AEgt/h+¢p) 
(Fig. 2c). 

Weare interested in the variations in the energy difference between 
the | -+£5/2,+5/2) and|+1/2,+1/2) states due to Lorentz violation. 
However, the energy difference is also affected by linear Zeeman shifts 
from a residual magnetic field gradient, quadratic Zeeman shifts, elec- 
tric quadrupole shifts from an electric field gradient, and a.c. Stark shifts 
from oscillating trapping fields**’’. The contributions from the mag- 
netic field gradient, which are of order 100 Hz, have opposite signs for 
the state | a and its mirrored counterpart, ie) = ( / V2) ({|+5/2, 
—5/2)+|+1/2,—1/2)). We can subtract out the contribution from 
the magnetic field gradient to the oscillation signal by taking the aver- 
age frequency f = (fx +i.) /2; where fg and f; are the oscillation fre- 
quencies of state | oe and | a respectively (Extended Data Fig. 1). 
The remaining effects (except for Lorentz violation) are energy shifts of 
order of only a few hertz and are also directly related to external elec- 
tromagnetic fields in the proximity of the ions. We expect these fields to 


Prepare state jy) or ye") 
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a 


x200 
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rs magnetic field 
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Figure 3 | Outline of the experimental scheme. a, The building blocks of our 
experiment are Ramsey-type interferometric sequences. In each measurement 
cycle, we first perform Doppler cooling and optical pumping of the ions. 
Then a series of 1/2- and 2-pulses on the S/.-Ds,2 transition prepare the ions in 
a product state that dephases into a mixed state within 1 ms. This state contains 
an entangled state | ¥"") = (1/./2)(|+5/2,#5/2) +|+1/2,41/2)) with 
50% probability. Afterwards, the mixed state evolves freely for Ramsey duration 
T, before another series of n- and m/2-pulses, together with an electron-shelving 
readout sequence, allows us to read out the state of the ions in the even— 
odd parity basis. This measurement cycle is repeated 200 times each for | v My 
and | ¥"). b, To correct for phase drifts in the preparation of |¥""), we 
measure the difference in the oscillation signal between Ramsey durations of 
100 and 5 ms. We then correct for the contribution of the magnetic field 
radient by taking the average of the oscillation signals measured with states 
cay and | pry, At the end of this measurement block, we measure the 
magnetic field by performing spectroscopy on the S}/.-Ds,2 transition to 
correct for the quadratic Zeeman effect. Each grey data point in Fig. 4a is a result 
from one of these measurement blocks. c, We continuously repeat the 
measurement block during the course of the 23 h-long measurement. To 
correct for the electric quadrupole shift caused by the electric field gradient, 
we measure the axial trap frequency by performing spectroscopy on the S,/.- 
Ds,2 transition. 
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be stable on the 10 * level over the course of a day, and the associated 
variations are on the level of a few millihertz and below. Moreover, 
we independently measure these fields using the ions themselves as a 
probe (Methods). 

We measured the energy difference between the states | +5/2, 
¥5/2) and |+1/2,#1/2) of *°Ca* for 23h starting from 3:00 coor- 
dinated universal time (UTC) on 19th April 2014, by monitoring the os- 
cillation signal of the ions with an effective Ramsey duration of 95 ms 
(Methods). At the same time, we monitored the magnetic field and the 
electric field gradient using the ions themselves as a probe (Fig. 3). We 
then used the measured values of the magnetic field and electric field 
gradient to correct for the quadratic Zeeman and electric quadrupole 
shifts. The resulting 23 h frequency measurement is shown in Fig. 4. 
With 23h of averaging, we reach a sensitivity of the oscillation fre- 
quency of 11 mHz, limited by statistical uncertainties due to short-term 
fluctuations. We then attribute any residual variation in the energy cor- 
related with the Earth’s rotation to Lorentz violation. 

Lorentz transformations of c,,, from the SCCEF to the laboratory 
frame results in the time-dependent energy shift due to Lorentz viola- 
tion given by 


0, (Hz) 


Averaging time (s) 


Figure 4 | Frequency measurements for “°Ca‘. a, The grey data points 
represent frequency measurements of “°Ca* taken after each measurement 
block as shown in Fig. 3b, with contributions from the quadratic Zeeman 
shifts and electric quadrupole shifts subtracted out. (Gaps in the data points are 
due to a failure of the laser frequency stabilization.) We started the 
measurement at 3:00 UTC on 19 April 2014, and continued for 23 h. The blue 
points are obtained by binning of data from 60 min time intervals. The error 
bars represent 1 s.d. of the data points within the bin, where we scale the 
error by \/72.duced = 1-3 (obtained from the fit of the binned data to the 
model in equation (4)). b, The blue points show the Allan deviation of the 
frequency measurement, or calculated from the unbinned data with error bars 
representing 1 s.d. The red solid line is the estimated quantum projection 
noise. The green dashed line is a fit to the data, showing a sensitivity to the 
ions’ energy variation of of =3.3 Hz/,/t, where t is the averaging time. 

The steady downward trend indicates that we are still limited by statistical 
fluctuations rather than by correlated noise or systematics over the course of 
the measurement. 
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Table 1 | Limits on differential electron-photon Lorentz-violation parameters C' ny = Cyy + K,,S2 


Parameters (C' yy = Cyy + Kyy/2) 


New limits 


Existing limits 


Cy (electrons) k,,/2 (photons) 


—0.16c'x_y + 0.33c¢' yy—0.92c'xz-0.16c' yz 

—0.04c'x-y — 0.32c' yy—0.35¢' xz + 0.88C' yz 

0.29c'yy = 0.38¢'xy— 0.73¢'xz- 0.48c' YZ 
0.31c'y_y — 0.65c' xy + 0.07c’xz-0.69¢' yz 


0.1+1.0x10°78 
24+74x1079 
5.9+95x10°19 
0.7+1.2 x10°38 


-25+3.5 x10 18 
—5.2+3.6 x10 18 
-0.6+3.8 x10 18 
~26+3.8 x10 18 


-0.9+1.0 x 10°16 
-0.9+65 x10-1” 
-8.1+95x1071” 
-29+65x1071” 


Fitting our frequency measurements to the model in equation (4) yields the limits on Lorentz-violation parameters c’,,. in the SCCEF. All uncertainties for the uncorrelated combinations of c’,,, are 1 s.d. from the fit 
conservatively scaled with WO? reduced) = 1.3. We improve the bounds from ref. 9 on the electron dispersion by up to two orders of magnitude. Alternatively, we can work in coordinates such that the electron 
dispersion is isotropic. We then improve by up to five times on the existing limits for the isotropy of the speed of light set by a modern version of the classic Michelson—Morley experiment in ref. 2 (Methods). Note 
that the work in ref. 9 assumed that k,,. = 0 whereas that in ref. 2 assumed that c,,, = 0. We use the notation c'y_y=c'xx — C' yy. 


AE . ; 

7 =Acos(@@T)+Bsin(w@T) +Ccos(2m@T) +Dsin(2m@T) (4) 
where W@ = 21/(23.93 h) is the sidereal angular frequency of the Earth’s 
rotation, T is the time since the vernal equinox of 2014, and A, B, Cand 
D are parameters related to c,,, in the SCCEF (Methods). Fitting our 
data (Fig. 4) to equation (4) yields the limits of the c,,, 
we report and compare with existing limits in Table 1. We improve the 
best measurements for those parameters, carried out by precision spec- 
troscopy of dysprosium”, by up to two orders of magnitude, toa level of 
10 '®. Alternatively, we can assume that Lorentz symmetry holds for 
electrons. We then can interpret our results as limits on Lorentz viola- 
tion for photons (Methods) and improve on the bounds for Lorentz 
symmetry set by photon Michelson—Morley experiments’ by up to five 
times (Table 1). 

Our experimental scheme is readily applicable to other trapped-ion 
species considered for quantum information purposes. Many of those 
possess a long-lived electronic state with a non-vanishing angular mo- 
mentum. Thus, further improvement can be achieved by increasing the 
Ramsey durations using metastable states with significantly longer life- 
times, such as 30 s for the barium ion’, or by using ions more sensitive 
to Lorentz violation, such as highly charged ions”. Additionally, by pre- 
paring a pure entangled state of the ions instead of a mixed state, it is 
possible to gain another factor of two in signal-to-noise ratio!”. Finally, 
we do not see any signature of limiting systematic effects, and thus ex- 
pect that future extensions of our experimental technique with better 
statistics will yield tests of Lorentz symmetry at the level of 10 *° and 
below, where the polarization of black-body radiation in combination 
with temperature changes is expected to become relevant. 


parameters, which 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Lorentz-violation parameters of electrons and photons. While Lorentz symmetry, 
or local Lorentz invariance, requires that the laws of physics be the same in all co- 
ordinate systems in the group formed by Lorentz transformations, it does not re- 
strict our initial choice of coordinates. As a result, some forms of Lorentz violation 
cannot be unambiguously attributed to a single species of elementary particle with- 
out first specifying this coordinate choice. In particular, we can select our initial 
coordinates such that c,,, (or its gauge field analogue k,,,) vanishes at leading order 
for any single species of particle (or gauge field). This particle then becomes a 
Lorentz-covariant ‘yardstick’ which other species can be compared against. For 
instance one might use light as the yardstick; that is, one would measure space such 
that x; = c;f with the speed of light c; constant in all three spatial directions i. Alter- 
natively, one might use a coordinate system for which Lorentz symmetry is pre- 
served for electrons. Then the value of c;t might not be the same in all three spatial 
directions. In this case Lorentz violation would manifest itself by breaking the rota- 
tional symmetry of the photon-mediated Coulomb force, yielding the same mea- 
surable energy shift as in the previous case. Consequently, a single experimental 
approach typically constrains a linear combination of particles and gauge fields. 

To analyse which linear combination in the SME we test in our experiment, we 
neglect contributions of the nucleus to the Lorentz-violation signal for two rea- 
sons. First, the quadrupole moment of the doubly magic *°Ca* nucleus is expected 
to vanish. Second, the violations of Lorentz symmetry for nucleon constituents 
have been constrained to 10° *° for protons” and 10°”? for neutrons*”. In addition 
to the Lorentz-violating Lagrangian for electrons (equation (1)) in the SME, the 
Lorentz violation for electromagnetic fields (photons) is given by the parameters 
KK UJ, K=X, Y, Z) (which are functions of ky») in the Lagrangian™ 


£= 5 [(1-+ke) |e (1A) [BP] + 5 [Eke E—Bi_B] + Bho B (5) 


where Ken isascalar, ko isa3 X 3 traceless symmetric matrix that characterizes the 
anisotropy of the speed oflight, and k, , isa3 X 3 antisymmetric matrix. By means 
of a coordinate transformation, observation of Lorentz violation for both the elec- 
trons and photons can be made to appear only in either the SME Lagrangian for 
the electron sector (equation (1)) or the photon sector (equation (5)). In both cases, 
the linear combinations of parameters relevant to our experiment are” 


iy: - 
cy y =exx —Cyy + ri Co — ix) 


1- 
fi ge Y 
Oxy = Oxy + 5 ke 
1- 
J A 
xz = XZ ee 
ie 
pe YZ 
Cyz =Cyz 1 a ke- 


The best existing limits on KX’, kX2, kY2 and kX¥ — Kk” are given in ref. 2. In 
Table 1, we compare our results to these limits. 

Experimental set-up. We trap a pair of *°Ca* ions ina linear Paul trap with an 
interelectrode distance of 1.0mm. We apply a radio-frequency with peak-to-peak 
voltage of ~500 V to each pair of the radio-frequency electrodes. One pair of the 
electrodes is driven in antiphase with the other pair. With a ~4 V d.c. voltage applied 
across the end caps, we obtain trap frequencies of 2m X (2.2, 2.0) MHz in the radial 
directions and 2m X 210 kHz in the axial direction. The axial direction is aligned 
horizontally in the laboratory frame. To define a quantization axis, we apply a static 
magnetic field of 3.930 G vertically (at 45° with respect to both radial directions of 
the trap) using a coil. Additionally, we use another magnetic coil to compensate for 
residual magnetic field gradient along the axial direction. 

Two independent 729 nm laser beams in the vertical direction drive n- and 1/2- 
pulses on the S,/.—Ds, transition on each ion separately. Both beams are derived 
froma laser stabilized to a high-finesse optical cavity to better than 100 Hz. Another 
beam path addressing both ions in the horizontal direction (at 45° with respect to 
the axial direction) is used for Doppler cooling (397 nm and 866 nm) and repump- 
ing the Ds,/2 state (854nm). We perform all laser light switching and frequency 
shifting using acousto-optical modulators (AOMs) in a double-pass configura- 
tion. We generate all radio-frequency voltages supplied to the AOMs using direct- 
digital-synthesizer chips from Analogue Devices (AD9910). The timing in the 
experimental sequence is controlled by a field-programmable gate-array (FPGA) 
module (XEM6010) from Opal Kelly. We characterize the stability of the on-board 
crystal oscillator using a frequency counter (Agilent 53210A). The clock stability is 
measured to be at the level of 4 X 107”, which translates to better than 5 Hz sta- 
bility in the oscillation signal of the measurement of Lorentz violation. 


Measurement scheme. The experimental sequence is shown in Fig. 3. We measure 
four independent oscillation signals for the two states | wr’ = (1/V/2)(|+5/2, 
—5/2) +|+1/2,—1/2)) and |¥®) =(1/V2) (| —5/2, +5/2) +|-1/2,+1/2)), 
each with both short (Tyhort = 5 ms) and long (Tiong = 100 ms) Ramsey duration 
(Fig. 3b). Within each measurement block in Fig. 3b, the order in which we per- 
form Ramsey spectroscopy for each state and the Ramsey duration is randomized 
to average out systematic noise that might coincide with the period (~60 s) of the 
measurement block. 

In general, the oscillation signal has the form S(t) =A cos(21ft+ @otiset + Praser) 
+B, where A is the amplitude of the signal, B is a possible offset to the overall level 
of the signal, fis the oscillation frequency, ofrset is the phase offset and @jaser is an 
additional phase that we can control by changing the phase of the 729 nm laser 
(using the radio-frequency signal supplied to the AOM for each beam path) that 
drives m- and 1/2-pulses on the S,/.-Ds,2 transition of the ions. 

For a given state and Ramsey duration, the Ramsey interferometric cycle shown 
in Fig. 3a is repeated 200 times. To cancel out drifts in the offset of the signal, B, we 
perform the first 100 cycles of the Ramsey sequence with the phase of the laser 
given by jaser and the next 100 cycles with the phase of the laser given by jaser + 7. 
We then calculate the difference between these two signals, (S( taser) —S(Ptaser + 
Tt))/2= A cos(2nft+ Poriset + Praser)» Which does not depend on 8. 

Fora fixed Ramsey duration T, the oscillation signal S(T) =A cos(27fT + O ofiset 
+ @raser) is most sensitive to variation in the oscillation frequency, f, when the signal 
crosses zero, that is, when 2nfT + osiset + taser = 1/2. We make sure that the os- 
cillation signal remains close to zero by adding the phase correction calculated from 
the oscillation signal, that is, 54 =cos~ '(S(T)/A) — 1/2, to the phase of the laser 
light, ¢iaser- The long-term measurement of the variation in the oscillation fre- 
quency, Of, is then derived from the phase correction data using 6¢ = 2nTOf. 

In addition to the change in the oscillation frequency, any change in @ofset in the 
state preparation affects the phase correction: = 2nTSf + Shorter To correct for 
a contribution from this phase offset, we use signals from the two Ramsey durations 
(Tshort = 5 ms and Tiong = 100 ms) and calculate the difference between the phase 
corrections: Pong — 5 short = 21(Tiong - Tshort) 5f. The oscillation frequency for the 
state | oe is given by dfi.r = [(Sdiong - SP short)/2(Tiong - Tshort)]L,.x, Where the 
effective Ramsey duration is Tiong - Tshort = 95 ms. 

Whereas the linear Zeeman effects from a magnetic field common to both ions 
drops out, the linear Zeeman effect due to a magnetic field gradient does not. Ina 
typical unshielded laboratory environment, the gradient remains stable enough to 
allow for contrast with Ramsey times of about 30 s (ref. 33). To remove extant fre- 
quency variations from the gradient, we take the average frequency Sf = (5f. + 
dfx) /2 (Extended Data Fig. 1), which now contains only contributions from the 
electric quadrupole shift, the quadratic Zeeman shift, a.c. Stark shifts from oscil- 
lating trapping fields, and shifts from Lorentz violation. 

We characterize the effect of the electric quadrupole shift by measuring the 
oscillation frequency 8f as a function of the electric field gradient by changing the 
axial trap frequency. For our experimental set-up, we obtain 5f = 4.0(8) (Hz mm? 
V—')E’ +8.9(8) (Hz), where E’ is the electric field gradient. At our operating axial 
trap frequency of 210 kHz, this translates to variations in the quadrupole shift due 
to changes in the axial trap frequency of 27 + 12 mHzkHz *. The offset of 8.9(8) Hz 
is due to the quadratic Zeeman shift, which agrees with the estimated value of 8 Hz 
for the applied magnetic field of 3.930 G. Any change in the magnitude of the applied 
magnetic field near our operating value of 3.930 G gives a variation of the quad- 
ratic Zeeman shift of 4 mHzmG /. Using the ions as a probe, we measure both the 
magnetic field and the axial trap frequency during the course of the experiment 
and correct for their contributions to the oscillation signal. Over the course of our 
23 h-long run, our axial trap frequency varies by less than ~1 kHz and the magnetic 
field by less than 1 mG. These instabilities translate into variations in the correction 
to the oscillation frequency of ~30 mHz, due to the quadrupole shift, and 3 mHz, 
due to the magnetic field. Fitting the model in equation (4) to the corrections only, 
we find that not taking into account the axial frequency instability would cause a 
false Lorentz-violation signal with amplitudes of less than 3 mHz, and that not cor- 
recting for the magnetic field instabilities would cause a signal with amplitudes of 
less than 0.5 mHz. Thus, in principle no correction for their drift would have been 
necessary. We note also that by measuring those quantities during the measurement 
run, their average contributions are expected to decrease as fast as the primary 
measurement signal, and should thus pose no limitation on improved Lorentz sym- 
metry tests with longer measurement runs. 

The oscillating electric field from the radio-frequency electrodes of the trap in- 
duces a.c. Stark shifts of the atomic transitions of the ions. The amplitude of the 
oscillating field experienced by the ions depends on the stray background static 
electric field. For our trap, we estimate that the stray electric field at the vicinity of 
theions is ~5 V cm” *. This produces a differential a.c. Stark shift between the | + 1/2) 
and |+5/2) states to be ~120 mHz (ref. 34). The stray field is expected to be stable 
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at better than the 10” 7 level during the course of the experiment, which translates 
to a change of less than 4 mHz in the oscillation frequency for the two-ion state. 
Statistical analysis of the data. After each measurement block (Fig. 3b), we obtain 
a data point for the frequency difference between both states. We then bin the data 
points within 60 min intervals. The error bar for each binned data point is assigned 
using the calculated standard deviation within each bin. To extract the amplitudes 
of Lorentz violation, we perform a weighted least-squares fit of the binned data points 
to the model given in equation (4). We scale the 1 s.d. errors of the fitted parameters 
with \/ 72. juced = 1-3 to account conservatively for other remaining systematics. 
Calculation of the energy shift due to the Lorentz-violation for “°Ca*. Viola- 
tions of Lorentz symmetry and Einstein’s equivalence principle in bound electronic 
states result in a small shift of the Hamiltonian that can be described by’ 


2U,\p 
5H. (a = dn) 5 


1 (2) (2) 
Co’ T 
6 0 0 


where we use atomic units, p is the momentum ofa bound electron, U is the New- 
tonian potential and c is the speed of light. The parameters oe, cP) and cjy are 
elements of the Cy tensor, which characterizes Lorentz violation. The relativistic 
form of the p’ operator is cyoy’Pj (asummation is implied by repeated indices), where 
y' are the Dirac gamma matrices. The non-relativistic form of the 7?) operator is 
Te) = p’ —3p2, where p is the component of the momentum along the quantiza- 
tion axis, and the relativistic form is Te?) =Cyo ( pj -37" Ps) . Therefore, the shift in 
the Ca* 3d *Ds,z energy level due to the C,» tensor depends on the values of the 
(3d?Ds/o|p”|3d7Ds,2) and (3? Ds/2 | Te) |3d? Dsj2) matrix elements. 


Using the Wigner-Eckart theorem we express the matrix element of the irre- 
ducible tensor operator Te) through the reduced matrix element (J||T||J) of the 
operator T as 


—JUT +1) +3; @) 
Jammer inl) © 


(mm) 


The expressions for the p’ and T°) matrix elements are given in the supplementary 
material of ref. 9. The values of the angular factor in equation (6) (that is, the pre- 
factor of the reduced matrix element) are — 0.27951 + 0.22361 m for 3d*D3,. and 
—0.21348+ 0.073193 m? for 3d*Ds). 

First we calculated the required matrix elements in a lowest-order Dirac—-Fock 
(DF) approximation and then with an additional random-phase approximation 
(DF+RPA). Next we carried out much more accurate calculations using the con- 
figuration interaction method with single and double excitations (CI+SD) and four 
variants of the all-order (linearized coupled-cluster) method*. The virial theorem 
is also used for the p’ calculations. 

The results are summarized in Extended Data Table 1. We note that we list the 
reduced matrix elements for the T operator but actual matrix elements for the p” 
operator because there is no necessity to introduce reduced matrix elements for a 
scalar operator. The values in columns DF(FC) and DF are lowest-order Dirac—Fock 
values calculated with and without the frozen-core approximation. In the frozen- 
core approximation the Dirac-Fock equations for the core electrons are solved self- 
consistently first and the valence orbital is calculated with an unchanged, that is, 
‘frozen’, core. For the p’ operator such an approximation appears to give very poor 
results for the 3d states. If the core orbitals are allowed to vary together with the 
valence orbital, the lowest-order value differs by only 16% from the final virial theo- 
rem value. Addition of the RPA correction to the frozen-core Dirac-Fock value 
fixes this problem as well, because RPA corrections describe the reaction of the core 
electrons to an externally applied perturbation. The perturbation produced by the 
operator p” is very large and, as a result, the RPA corrections for (i|p*|y/) matrix 
elements are large. Such a problem does not arise for the T) operator; the corre- 
lation correction to its matrix elements is much smaller and the accuracy of the 
resulting values is much higher. 

The CI+SD calculations are carried out using the Dirac—Fock basis for the oc- 
cupied core and valence atomic states and the Dirac-Fock-Sturm basis for unoc- 
cupied virtual orbitals; the frozen-core approximation is not used. The description 
of the Dirac-Fock-Sturm equations is given in refs 36, 37. The configuration state 
functions are constructed from the one-electron wavefunctions as a linear combi- 
nation of Slater determinants. The set of the configuration state functions is gen- 
erated including all single and double excitations into one-electron states of the 
positive spectrum. Single excitations are allowed to all core shells; double excitations 
are allowed to 3s and 3p core shells. 

To calculate the value (v|p’|v), where |v) is the valence electron wavefunction, 
we also used the approach based on the virial theorem. In the nonrelativistic limit 
the virial theorem can be written in the form 
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where E is the total energy of the system, Nis the total number of electrons and | Y) 
is the total wavefunction of all electrons in the atom. Therefore, the value (v|p”|v) 
can be calculated using the removal energies of the valence electron. The virial the- 
orem makes it possible to calculate the expectation value of the p* operator as twice 
the difference of the total energies Eyyand Ey, of Nand N — 1 systems. Because the 
differential energy E can be calculated with an accuracy much higher than the wave- 
function ¥, this approach is appropriate for the light atoms and ions where rela- 
tivistic effects are negligible. The virial theorem results that use experimental data 
for the 3d removal energies from ref. 38 are listed in column VT in Extended Data 
Table 1. 

We have also carried out the calculations of the (v|p*|v) and (v|| 7 ||v) matrix 
elements using the all-order (linearized coupled-cluster) method”. The all-order 
method gave very accurate values of the 3d; lifetimes” and quadrupole moments“ 
inaCa’ ion. In the all-order method, single, double and partial triple excitations of 
Dirac-Hartree-Fock wavefunctions are included to all orders of perturbation the- 
ory. We refer the reader to the review in ref. 35 for the description of the all-order 
method and its applications. Both single-double (SD) and single-double-partial- 
triple (SDpT) ab initio all-order calculations were carried out. In addition, a scaling 
of the dominant terms” was carried out for both SD and SDpT calculations to im- 
prove the accuracy and to evaluate the uncertainty of the final values. The calculations 
were carried out with both nonrelativistic and relativistic operators; the differences 
were found to be negligible at the present level of accuracy. The values calculated 
with relativistic operators are listed in Extended Data Table 1. 

The virial theorem values are taken as final for the matrix element of the p” op- 
erator. The uncertainty of 12% is estimated as the difference of the virial theorem 
and all-order values. The SD scaled values are taken as final for the T” operator 
(see refs 39, 40 for the discussion of the choice of the final all-order values). The 
uncertainty is determined as the spread of the four all-order values. On substituting 
the final all-order values of the (3dD,||cyo(y’p; — 3y°ps)||3d7D,) matrix element 
into equation (6) and using the virial theorem value of (3d7D,|p’|3d*D,), we get 


AE (o) 2U , 15 
a CG 32 £00 x (— 2.46 x 10° Hz) 


+9) x (2.17 x 10° —1.47 x 105 m?) Hz 
for 3d7D3,., and 


AE 2 
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+2) x (2.16 x 10! —7.42 x 10" m?) Hz 
for 3d*Ds/., where the uncertainties in the frequency coefficients of (o a 


2U 
3a Coo) and cf) are estimated to be 12% and 2%, respectively, and the atomic units 
c 


are converted to SI units using 1 a.u. ~ h X (6.57968 X 10° Hz). 
The frequency difference (in Hz) between the shifts of the mj = 5/2 and m;= 
1/2 states for a pair of “°Ca* ions used in our experiment is given by 


2 
5 (Em =5/2 Eyy =1/2) =(— 1.484 x 10! Hz) x ((5/2)?—(1/2)) cP) 


=(—8.9(2) x 105 Hz) x c? 


Frame dependence of the c,, tensor. Because of the Earth’s motion, c,, in the 
local laboratory frame varies according to the time-dependent Lorentz transfor- 
mation given by 


ref 


way, 4 

where A is the Lorentz transformation matrix and cjyy is c),, written in the Sun- 
centred, celestial-equatorial frame (SCCEF). The matrix / consists of a rotation 
and a velocity boost of the experiment with respect to the Sun. In the laboratory 
frame, we define the x axis to point to the East, the y axis to point to the North and 
the Z axis to point upward. The rotation matrix that transforms from the SCCEF to 
the local laboratory frame is given by 


—sin(@@T) cos(M@T) 0 
R= | —cos(x)cos(m@T) —cos(x)sin(@@T) sin(y) 
sin (y) cos(W@ T) sin(y)cos(w@T) —_cos(y) 


where the angle y ~ 52.1° is the colatitude of the experiment (Berkeley, California), 
T is time since the vernal equinox of 2014 and w@ = 21/(23.93 h) is the sidereal 
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angular frequency of the Earth’s rotation. The boost of the experiment in the 
SCCEF is given by 


— Be sin(7)cos(QT) 
B=] Bo cos(n)cos(QT) — f; sin(x)cos(we@ T) 
— Bg sin(QT) + B, sin(z)sin(@@ T) 
where Bg ~ 10 * is the boost from the Earth’s orbital velocity, 8; ~ 1.5 X 10° ° is 
the boost from the Earth’s rotation, @ is the yearly sidereal angular frequency and 
n ~ 23.4° is the angle between the ecliptic plane and the Earth’s equatorial plane. 
The parameter relevant to our experiment is cP?) . With the Lorentz transforma- 
tion applied to c,,, in the SCCEF, we can write the value of cP?) in the local labo- 


ratory frame in terms of ¢,,, in the SCCEF as 


Gi =A+ (Geol) +5.sin(o) 
j 


where the index j runs over all angular frequencies (c9;) and the corresponding am- 
plitudes (C; S)) given in Extended Data Table 2, and A is a constant offset. For our 
23 h measurement, the time-dependent Lorentz-violation signal is given at leading 
order by 


CY) = —3 sin(2y)ch, cos(w@ T) —3 sin(27)¢,, sin(w@T) 
2 
a (chy — yy) sin? (x) cos(2M@ T) — 3¢yy sin? (z)sin(2@ T) 


We fit our binned 23h measurement data to this model and extract Lorentz- 
violation parameters. In Table 1 we report uncorrelated combinations of parameters 
by diagonalizing the covariance matrix from the fit. We scale the 1 s.d. uncertainties 


from the fit with \/72.4uceq = 1-3 to account conservatively for other remaining 
systematics. 

With a year-long measurement, we expect to reach a sensitivity of 1 mHz in the 
ions’ oscillation frequency. This level of sensitivity allows us to bound c;y, cpy and 
Cry at the 10° '® level, which will improve the present limits” on these parameters 
by at least an order of magnitude. 
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frequency (Hz) 


Extended Data Figure 1 | Cancellation of the contributions from the bottom blue (fg) data sets, respectively. We offset both data sets for visualization 
magnetic field gradient. The frequency measurements of the states | a) and purposes. The contribution from the magnetic field gradient is subtracted 
| v -) for a Ramsey duration of 100 ms are shown in the top green (f,) and out in the average frequency f = (f, +fx) /2, which is shown as red data points. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Table 1 | Lowest-order DF, DF+RPA, Cl+SD and all-order results for the (3d 2D, | p?|3dD,) and (3d7D,| | T | | 3d2D,) matrix 
elements in Cat in atomic units 


Matrix element DF(FC) DF RPA CI+SD All-order VT Final 

(3d *D3/2|p7|3d 7D3/2) 3.05 0.67 (0.66 0.73 0.83 0.748 —0.75(9) 
(3d *Ds)2|p7|3d 7Ds/2) 3.04 0.66 0.66 0.73 0.83 0.748 —-0.75(9) 
(3d 7D3/2||T'|[3d *D3,2) 5.45 6.22 5.72 6.89 7.09 7.09(12) 
(3d 7Ds/2||T'|13d *Ds/2) 7.12 8.11 7.47 8.98 9.25 9.25(15) 


The virial theorem values are listed in column VT. The values in columns DF(FC) and DF are lowest-order Dirac—Fock values calculated with and without the frozen-core approximation. 
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Extended Data Table 2 | Amplitudes of various frequency components for c?) 
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expressed in terms of c,,, in the SCCEF 


eel 


Cj 


Sj 


Q - We 
Q+ We 
20 - we 
20 + We 
Q - 2We 


Q + 2We 
22 - 2we 
20 + 2we 


-3 sin(2y)cyz + 2cryBL 
(chy — yy) sin’ (y) 
-5Ba(3 cos(2y) + L)(cpy cos(7) — 2c57 sin(7)) 

0 

3BaC'py Sin) sin(2y) 

5BaCpy sin(7) sin(2y) 
0 
0 


3 sin(2y)c'p, — 2chryBr 
—3cyy sin*(y) 
$Bac'-y(3 cos(2y) + 1) 
0 
—3Be sin(2x) (c’py sin (m) + cpz(1 + cos (q))) 
—3Bo sin(2y) (ci-z(1 — cos()) - cfpy sin()) 
0 
0 
—3B 9 Ci cos” (3) sin*(y) 
—3Boc'py sin? (4) sin?(y) 
0 
0 


The frequencies w@ and Q are the daily and yearly sidereal angular frequencies, respectively. The angle 7 ~ 52.1° is the colatitude of the experiment (Berkeley, California). The angle 1 ~ 23.4° is the angle between 
the plane of the ecliptic and the Earth’s equatorial plane. f.@ ~ 10 ‘is the boost from the Earth’s orbital velocity and f, ~ 1.5 x 10° is the boost from the Earth’s rotation. For our 23 h-measurement, contributions 


from these two boosts are negligible. 
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Anomalous dispersions of 


‘hedgehog’ particles 


Joong Hwan Bahng’, Bongjun Yeom”, Yichun Wang’, Siu On Tung’, J. Damon Hoff* & Nicholas Kotov'”"*>"° 


Hydrophobic particles in water and hydrophilic particles in oil aggreg- 
ate, but can form colloidal dispersions if their surfaces are chemically 
camouflaged with surfactants, organic tethers, adsorbed polymers or 
other particles that impart affinity for the solvent and increase inter- 
particle repulsion’. A different strategy for modulating the inter- 
action between a solid and a liquid uses surface corrugation, which 
gives rise to unique wetting behaviour’ °. Here we show that this topo- 
graphical effect can also be used to disperse particles in a wide range 
of solvents without recourse to chemicals to camouflage the particles’ 
surfaces: we produce micrometre-sized particles that are coated with 
stiff, nanoscale spikes and exhibit long-term colloidal stability in both 
hydrophilic and hydrophobic media. We find that these ‘hedgehog’ 
particles do not interpenetrate each other with their spikes, which 
markedly decreases the contact area between the particles and, there- 
fore, the attractive forces between them. The trapping of air in aque- 
ous dispersions, solvent autoionization at highly developed interfaces, 
and long-range electrostatic repulsion in organic media also contrib- 
ute to the colloidal stability of our particles. The unusual dispersion 
behaviour of our hedgehog particles, overturning the notion that 
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like dissolves like, might help to mitigate adverse environmental ef- 
fects of the use of surfactants and volatile organic solvents, and dee- 
pens our understanding of interparticle interactions and nanoscale 
colloidal chemistry. 

We imparted strong corrugation onto the surface of carboxylated 
polystyrene microspheres (PSs) by attaching rigid zinc oxide (ZnO) 
nanoscale spikes (‘nanospikes’). This involves initial absorption of pos- 
itively charged ZnO nanoparticles (NPs) onto the negatively charged 
[PSs and subsequent growth of ZnO nanowires using established pro- 
tocols®. The resultant hedgehog particles combine micro- and nanoscale 
structural features (Fig. 1a), and their geometrical and topographical 
specifications can be adjusted by changing the growth conditions to 
modify the surface densities, lengths and diameters of nanospikes and 
the PS diameters (Fig. 1b-e, Supplementary Information section 1 and 
Supplementary Figs 2-5). 

As-made hedgehog particles, with their polar ZnO surfaces, are highly 
hydrophilic. They form excellent dispersions in water (Fig. 1f, 1) and 
other hydrophilic solvents. When rendering the hedgehog particles hy- 
drophobic by silanization of the ZnO nanospikes with (7-octen-1-yl) 


Figure 1 | Hedgehog particles. a, Negatively 
charged, carboxylate-terminated [PSs are used as 
core templates (1) on which positively charged 
ZnO NPs are adsorbed (2). ZnO nanospikes are 
grown from ZnO nanoparticles (3) to a designed 
length (4, 6). Hedgehog particles are rendered 
hydrophobic by exposure to OTMS or PFTS (5). 
b-e, SEM images of hedgehog particles with 
different ZnO nanospike lengths: 0.19 1m 

(b), 0.27 um (c), 0.4 um (d), 0.6 pm (e). f, Confocal 
microscopy of an aqueous dispersion of 
hydrophilic hedgehog particles with fluorescently 
labelled PSs. Inset, SEM image for the same 
hedgehog particles. g, h, SEM (g) and confocal 
microscopy (h) of an aqueous dispersion of OTMS- 
HPs. i, SEM image of particles from the bulk of an 
aqueous OTMS-HP dispersion collected five 

days after initial preparation. j, k, Confocal 
microscopy images of fluorescent OTMS-HPs 
(green, Amax = 486 nm) with adsorbed 
hydrophobic CdSe nanoparticles (red, 

Amax = 655 nm) in an aqueous dispersion (j) and in 
the dried state (k). 1, Photographs of aqueous 
dispersions of (left to right) hydrophilic hedgehog 
particles (HPs) with green-dyed PSs, OTMS-HPs, 
OTMS-puPSs and OTMS-ZnO nanowires (NWs). 
m, Photographs of (left to right) ZnO nanoparticles 
(NPs) in water, ZnO nanoparticles in 1 M NaCl, 
and OTMS-HPs in 1 M NaCl. n, Photographs of 
OTMS-HPs in (left to right) 0.1 M NaCl and 

0.01 M NaCl. 
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trimethoxysilane (OTMS) or 1H,1H,2H,2H-perfluorooctyltriethoxysilane 
(PFTS) (Supplementary Methods; spectroscopic evidence of silaniza- 
tion is shown in Supplementary Information section 2 and Supplemen- 
tary Fig. 9c), they form stable dispersions in heptane and hexane 
(Supplementary Information section 2 and Supplementary Fig. 6a, b). 

Surprisingly, highly corrugated OTMS-modified hydrophobic hedge- 
hog particles (OTMS-HPs) also form dispersions in water (Fig. 1g, h, l), 
and hydrophilic hedgehog particles disperse in representative hydro- 
phobic solvents such as heptane, hexane and toluene (Fig. 4a—-h). This 
illustrates that surface topography can be used to modulate the inter- 
action between microscale particles and disperse them in phobic solvents. 

Immediately on sonicating various hydrophobic OTMS-HP and PFTS- 
modified hydrophobic hedgehog particle (PFIS-HP) formulations in 
water (Supplementary Information section 2 and Supplementary Fig. 7), 
we observed the formation of a precipitate on the bottom of the vial, 
floating aggregates on top of the liquid, and a stable opalescent disper- 
sion. Dispersions remain stable and free of aggregation for a subset 
of particles for at least five days, as verified by scanning electron micro- 
scopy (SEM) (Fig. li) and dynamic light scattering (Supplementary 
Information section 2 and Supplementary Table 1). The percentage of 
hydrophobic hedgehog particle aggregates floating on the surface of the 
dispersions increased with elongation of the nanospikes (Supplementary 
Information section 2 and Supplementary Fig. 7a—d), but the colloidal 
stability of the particles dispersed in water was also enhanced (Fig. 2k-m). 

To exclude the possibility that the observed behaviour arises because 
our samples contain a subpopulation of hydrophilic OTMS-HPs or 
PFTS-HPs or represent a special case of Janus colloids, that is, colloids 


LETTER 


Figure 2 | Interface between hydrophobic 
hedgehog particles and water. a—h, Confocal 
microscopy images of hydrophobic hedgehog 
particles labelled with hydrophobic CdSe NPs in 
aqueous dispersions (a, b) and hydrophilic 
hedgehog particles labelled with hydrophilic CdTe 
NPs in aqueous dispersions (c, d); hydrophobic 
hedgehog particles in an aqueous solution 
containing hydrophilic TGA-stabilized CdTe 
nanoparticles with green (Amax = 540 nm) 
emission (e); hydrophilic hedgehog particles in an 
aqueous solution containing hydrophilic TGA- 
stabilized CdTe nanoparticles (f); the same sample 
from image e after five days of storage in dark (g); 
and the same sample from image f after five days of 
storage in dark (h). i, SEM image of a hydrophobic 
hedgehog particle with a self-assembled film of 
TGA-depleted CdTe nanoparticles between the 
ZnO nanospikes, indicating the location of the air- 
water interface. The hydrophobic hedgehog 
particles were immersed in an aqueous solution of 
CdTe nanoparticles for 72 h. j, Schematic diagram 
of the air—water interface, showing the 
experimental parameters (definitions in 
Supplementary Information). k-m, SEM images of 
aqueous dispersions of hydrophobic hedgehog 
particles with ZnO nanospike lengths of 

0.19 um (k), 0.40 pum (1) and 0.57 um (m). 


consisting of distinct hydrophobic and hydrophilic interfacial sectors’, 
we directly probed the hydrophobic nature of the particles after proces- 
sing them into dried thin films. The filtrate of suspended OTMS-HPs 
exhibited high water repellency causing the droplets to roll off (the ‘lotus 
effect’; Supplementary Information section 2, Supplementary Videos 
1-3 and Supplementary Fig. 8). Further evidence is obtained by injecting 
hydrophobic cadmium selenide (CdSe) nanoparticles into an aqueous 
dispersion of OTMS-HPs: confocal and transmission electron micro- 
scopy (TEM) images show the expected anchoring of hydrophobic 
nanoparticles on the spikes (Fig. 1j, k, Supplementary Information sec- 
tion 2 and Supplementary Fig. 9d), thus confirming their hydrophobi- 
city and the uniformity of surface derivatization. The stability of the 
hydrophobic hedgehog particles in aqueous dispersion did not change 
on CdSe adsorption. 

The wetting of corrugated surfaces is often attributed to a Cassie- 
Baxter wetting mode!*” and in our case could include formation of an 
air shell in the vicinity of the PS core. Such trapped air bubbles'* might 
provide buoyancy to the hedgehog particles, but are known to be ther- 
modynamically unstable’*’”. The presence of trapped air is verified by 
adding ethanol and observing gas evolving from the dispersion (Supple- 
mentary Video 5), and by observing, in high-resolution confocal micro- 
scopy images of the particles, concentric shells with markedly different 
refractive indices as would be expected ifn air shell is present (Fig. 2a, b, 
Supplementary Information section 3 and Supplementary Fig. 10). 
Hydrophilic hedgehog particles in water have no air shells, and they 
appear under the same conditions as uniformly lit particles (Fig. 2c, d, 
Supplementary Information section 3 and Supplementary Fig. 11). 


4,10-12 + 
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When adding fluorescent cadmium telluride (CdTe) nanoparticles, 
stabilized with hydrophilic thioglycolic acid (TGA), to an aqueous dis- 
persion of the hydrophobic hedgehog particles, a dark zone devoid of 
emission around the hedgehog particles confirms the presence of a layer 
of air. The dimensions of the emission exclusion zones closely match the 
diameter of hedgehog particles (Fig. 2e). The fact that similar images 
were obtained after five days of storage in the dark without agitation 
(Fig. 2g) attests to the long-term stability of dispersions of our hydro- 
phobic hedgehog particles in water, consistent with long-term trapping 
of air at macroscale corrugated surfaces'* (Supplementary Information 
section 3 and Supplementary Fig. 13). Hydrophilic hedgehog particles in 
identical luminescent media appear as bright spots with CdTe nanopar- 
ticles localized on and between the nanospikes (Fig. 2f, h). 

Strong scattering of photons and electrons by ZnO nanospikes pre- 
vents successful optical or cryogenic TEM imaging of the air—water 
interface within hedgehog particles, but we can locate it by taking advant- 
age of the fact that CdTe nanoparticles can self-assemble into nanowires” 
and nanosheets” at interfaces”': a thin layer of CdTe nanoparticles 
that assembles more than 200 nm in from the ends of the ZnO nanos- 
pikes (Fig. 2i, Supplementary Information section 3 and Supplementary 
Fig. 14) pinpoints the water meniscus (Fig. 2j). This allows us to calculate 
an average hedgehog particle density of 0.92 gcm * (Supplementary 
Information section 3), which closely matches the density of water and 
explains the buoyancy of the particles. 

We must also explain why two individual hydrophobic hedgehog 
particles do not coalesce on collision. To do so, we refer to the extended 
Derjaguin—Landau-Verwey—Overbeek (E_DLVO) theory, according to 
which the sum of potentials associated with van der Waals (Vyaw), 
electrical double layer (Vp,) and hydrophobic (Vy,) interactions appro- 
ximate the total interaction potential (Vz _piyo) between the hydro- 
phobic hedgehog particles: Vg prvo = Vvaw + Vor + Vu. Evaluating 
interparticle interactions in different configurations (Fig. 3a—c, Supple- 
mentary Information section 4 and Supplementary Fig. 19), we find that 
hedgehog-particle/hedgehog-particle pair potentials display high repul- 
sive energy barriers of at least 14kgT (kp, Boltzmann’s constant) for the 
outer contour of spikes (x = 0; Fig. 3d). Penetration of the nanospikes 
into the interstitial spaces of another hedgehog particle (x < 0) is ener- 
getically unfavourable (Fig. 3d). 

Comparison of the Vg_prvo for hydrophobic hedgehog particles with 
that for hydrophobic PSs (Supplementary Information section 4 and 
Supplementary Fig. 24d) shows that the interfacial corrugations trans- 
form the overall attractive potential into a repulsive one. For hedgehog 
particles with short nanospikes, Vg_prvo reverses such that the inter- 
action becomes attractive (Fig. 3e), matching the experimental results 
in Fig. 2k. The key reason for the anomalous stability of hedgehog par- 
ticle dispersions is that V,gw and Vyp are greatly decreased for cor- 
rugated particles compared with the smooth spheres (Fig. 3f, g), owing 
to the drastic reduction in the contact area in the former case. The total 
contour area of tapered spikes represents <3% of the surface area of the 
smooth particles (Fig. 1a). 

The colloidal stability of hydrophobic hedgehog particles in water is 
also enhanced by the presence of the double electric layer at the air-water 
interface, increasing Vpr. The zeta-potential (C) of air bubbles, which 
affects their electrostatic repulsion, is known to be between —35 mV 
(ref. 22) and —65 mV (ref. 23). Such high ¢ is attributed to autoionization 
of water™ that may also occur at the hydrophobic interfaces****. However, 
the fact that the hedgehog particle dispersion remains stable in the pre- 
sence of 0.01-1.0 M NaCl, which leads to screening of electrostatic inter- 
actions and coagulation of ‘normal’ dispersions (Fig. 1m, n), shows that 
any increased electrostatic repulsion has a secondary role and that the 
anomalous colloidal behaviour of hedgehog particles is dominated by the 
reduction of attractive interactions between the particles. But limitations 
of Derjaguin-Landau-Verwey-Overbeek theory for high ionic strengths 
and nanoscale corrugated surfaces*® may need to be considered for a 
more complete mechanistic explanation. 
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Figure 3 | Interaction potentials of hydrophobic hedgehog particles in 
aqueous media. a-c, Two general configurations, spike-to-spike (S-S; a) and 
spike-to-gap (S-G; b), are considered, along with the intermediate case in 
which the ZnO nanospikes face the side walls of opposing particles (ZS-G; c). 
d-g, Interaction potentials between hydrophobic hedgehog particles. d, Pair 
potentials for hydrophobic hedgehog particles in S-S (Vg_pivos-s, black), S-G 
(Vg_pLvos-G orange) and ZS-G (Vg_pivo,zs-e green) configurations. The 
negative values of x correspond to the penetration of ZnO nanospikes into the 
interstitial spaces of another hedgehog particle; x = 0 corresponds to the outer 
contour around the spike tips. e, Pair potentials (Vz prvoyp) of hydrophobic 
hedgehog particles in an aqueous dispersion calculated according to the 
E_DLVO theory for the zeta-potentials at the air—water interface with 

€ = —65 mV (black line) and € = —35 mV (red line), and for hydrophobic 
hedgehog particles with short nanospikes from Fig. 2k (green line). 

f, Hydrophobic interaction potentials of OTMS-HPs (Vus.np, green) and 
OTMS-uPSs (Viyzp,ps, dotted green). g, Van der Waals interaction potentials of 
OTMS-HPs (Vy aw.up; blue) and OTMS-uPSs (V,aw-ps, dotted blue) and total 
attractive potentials of OTMS-HPs (V aw+ns,up» red) and hydrophobic 
OTMS-uPS (V,aw+np,ps dotted red) in water. 


If the drastic reduction in attractive components of the pair potential is 
the reason for the unusual stability of hedgehog particle dispersions, the 
same effect should occur in dispersions of hydrophilic colloids in hydro- 
phobic solvents. Stable dispersions of hydrophilic hedgehog particles 
were obtained in heptane, hexane and toluene (Fig. 4a, Supplementary 
Information section 5 and Supplementary Fig. 26). SEM and confocal 
microscopy images (Fig. 4b-e) demonstrated non-agglomerated part- 
icles in the bulk of these dispersions and physical integrity of hedgehog 
particles (Fig. 4f-h). The PS core of the hedgehog particles was dis- 
solved in toluene, thus yielding a dispersion of hydrophilic hedgehog 
particle shells. As expected, ZnO nanoparticles and ZnO nanowires 
(Supplementary Information section 5 and Supplementary Fig. 27) do 
not disperse in the same solvents. 

Calculations show that V,aw for this type of dispersion is much 
reduced compared with smooth spheres, and that the overall pair poten- 
tial of hydrophilic hedgehog particles in heptane is indeed repulsive with 
VpLvo.uPs = 1.4kgT at x = 0 nm (Fig. 4i, j, Supplementary Information 
section 5 and Supplementary Fig. 28). Notably, dispersion in organic 
solvents lack the air layer between the spikes, and electrostatic interac- 
tions in organic solvents are not screened as in aqueous solutions and are 
therefore longer ranged. 

The stability of our surfactant-free hedgehog particles in ‘phobic’ sol- 
vents offers a different perspective on scientific and technological problems 
related to colloidal interactions and might even enable new strategies 
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Figure 4 | Dispersion of hydrophilic hedgehog particles in hydrophobic 
organic solvents. a, Dispersions of hydrophilic hedgehog particles in (left to 
right) heptane, hexane and toluene. As in the case of dispersion of hydrophobic 
hedgehog particles in water, after sonication of hydrophilic hedgehog particles 
in organic solvent there was always a small amount of precipitate in the bottom 
of the vial. b, Confocal microscopy image of hydrophilic hedgehog particles in 
heptane. c-e, SEM images of hydrophilic hedgehog particles from dispersions 
in heptane (c), hexane (d) and toluene (e). f-h, SEM images of individual 
hedgehog particles in heptane (f), hexane (g) and toluene (h). Toluene dissolves 
the UPS core in the hedgehog particles, rendering dispersions of hydrophilic 
spiky shells. i, Van der Waals interaction potentials Vaw of hydrophilic 
hedgehog particles (blue) and PS (red) in heptane. j, Total pair potential 
Ve_pivo Of hydrophilic hedgehog particles in heptane. 
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for processing and dealing with colloids, for developing new drug deliv- 
ery systems”, and for colloidal self-assembly’”*”’. We also believe that 
the unusual solvation behaviour of hedgehog particles (contrary to the 
traditional expectations of particle dispersion stability in hydrophobic/ 
hydrophilic solvents) could be used to develop efficient adsorbers, ab- 
sorbers, scatterers or catalysts that need to function in both organic and 
aqueous media. 
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Metal-catalysed azidation of tertiary C-H bonds 
suitable for late-stage functionalization 


Ankit Sharma’ & John F. Hartwig! 


Many enzymes oxidize unactivated aliphatic C-H bonds selectively to 
form alcohols; however, biological systems do not possess enzymes 
that catalyse the analogous aminations of C-H bonds’”. The absence 
of such enzymes limits the discovery of potential medicinal candi- 
dates because nitrogen-containing groups are crucial to the biological 
activity of therapeutic agents and clinically useful natural products. 
In one prominent example illustrating the importance of incorp- 
orating nitrogen-based functionality, the conversion of the ketone 
of erythromycin to the -N(Me)CH2- group in azithromycin leads 
to acompound that can be dosed once daily with a shorter treatment 
time**. For such reasons, synthetic chemists have sought catalysts 
that directly convert C-H bonds to C-N bonds. Most currently used 
catalysts for C-H bond amination are ill suited to the intermolecular 
functionalization of complex molecules because they require excess 
substrate or directing groups, harsh reaction conditions, weak or 
acidic C-H bonds, or reagents containing specialized groups on the 
nitrogen atom* *. Among C-H bond amination reactions, those form- 
ing a C-N bond at a tertiary alkyl group would be particularly valu- 
able, because this linkage is difficult to form from ketones or alcohols 
that might be created in a biosynthetic pathway by oxidation’. Here 
we report a mild, selective, iron-catalysed azidation of tertiary C-H 
bonds that occurs without excess of the valuable substrate. The reac- 
tion tolerates aqueous environments and is suitable for the functio- 
nalization of complex structures in the late stages of a multistep 
synthesis. Moreover, this azidation makes it possible to install a range 
of nitrogen-based functional groups, including those from Huisgen 
‘click? cycloadditions and the Staudinger ligation’*'°. We anticip- 
ate that these reactions will create opportunities to modify natural 


products, their precursors and their derivatives to produce analo- 
gues that contain different polarity and charge as a result of nitro- 
gen-containing groups. It could also be used to help identify targets 
of biologically active molecules by creating a point of attachment— 
for example, to fluorescent tags or ‘handles’ for affinity chromato- 
graphy—directly on complex molecular structures. 

To develop a mild method for the conversion of an alkyl C-H bond 
to an alkyl C-N bond, we focused on reactions of the hypervalent iodine 
reagent 1 containing an azide unit (Fig. 1). Such a reagent is related to 
hypervalent reagents commonly used for oxidation” and is thermally 
stable (up to 130°C)*’. It has sufficient thermodynamic potential to 
convert alkyl C-H bonds to alkyl azides, but the published reactions 
have been limited to simple hydrocarbons, typically used in excess 
amounts, or activated C-H bonds at high temperatures in the presence 
of radical initiators”!. Thus, the current azidations of C-H bonds by this 
reagent” are not suitable for late-stage functionalization of complex mole- 
cules. If an appropriate transition-metal catalyst for C-H bond func- 
tionalization with this hypervalent iodine reagent could be identified, 
then C-H bond amination reactions that incorporate an azide into 
complex molecules with site selectivity could be devised. Previously, 
iron- and manganese-porphyrin complexes were reported to catalyse 
the formation of alkyl azides using sodium azide and iodosobenzene or 
tBuOOH as oxidants, but the reactions were limited to hydrocarbons, an 
excess of the alkane (10 equiv.) was required, and a major by-product 
was the corresponding alcohol’. 

To identify a metal complex that would catalyse the azidation of C-H 
bonds with 1 under mild conditions, we investigated the reaction of 
cis-decalin. Various metal complexes, including metal porphyrins, were 


@-'-0 Fe(OAc),* -Bz pNO, 
ua a 1. Pd/C, H, 
H 

~ GHCN CN 2. pNO,BzCl 
0. = ml) Major diasteromer 

1.0 equiv. 1, 2.0 equiv. 23°C 3at 4a, 80%+ 

= tBu = oO 

= — 
"y - a P rape \ 2 \ 4 y iW 
= Py N Py we 
C Py N \—py N N 
L1, 4% L2, 7%, 3.2:1 L3, 10%, 3.0:1 L4,11%, 4.3:1 L5, 3%, 2.3:1 L6, 5%, 3.6:1 
Py = pyridine 


oH OY 


L7, 10%, 2.8:1 L8, 20%, 4.0:1 L9, 22%, 4.0:1 


Figure 1 | Development of a catalyst for the azidation of aliphatic C-H 
bonds. Top row, reactions studied. *Conditions: 10.0 mol% Fe(OAc)>, 11.0 
mol% ligand (L1-L11), cis-decalin (0.2 mmol, 1.0 equiv.) and 1 (0.4 mmol, 2.0 
equiv.), 23°C, 1 ml of CH3CN. The relative configuration of the major 
diastereomer of 3a was confirmed by X-ray crystallographic analysis of 4a. 


OD rant oy 


- iPr 


L10, 35%, 4.1:1 L11, 75%, 4.3:1 


{Combined yield of isomers. Middle and bottom rows, structural formulae of 
ligands L1-L11. Also shown are yields and the ratios of isomers determined by 
gas chromatography analysis with dodecane as internal standard; ratios are not 
corrected for response factors of minor isomers. 
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tested as catalysts with the hydrocarbon as limiting reagent; only iron 
complexes provided measurable yields of azide product at 23°C (Sup- 
plementary Table 1). Combinations of Fe(OAc), and various bi-, tri- 
and tetra-dentate nitrogen ligands (L1-L5 in Fig. 1), including those 
that catalyse the selective hydroxylation of aliphatic C-H bonds (L1 
and L3)””*, provided low yields of the product 3a (Fig. 1; 3-11%). 
However, substantial yields of 3a were observed with iron complexes 
of oxazoline-derived ligands, particularly with those possessing larger 
N-Fe-N ‘bite’ angles (L6-L9 in Fig. 1). Finally, we found that reac- 
tions of iron complexes containing tridentate nitrogen ligands of the 
pybox family (L10 and L11 in Fig. 1) provided product 3a in good yield 
with high selectivity for reaction at a tertiary C-H bond (75%, 4.3:1 
ratio of diastereomers). Reactions conducted in ethyl acetate (EtOAc) 
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with the catalyst containing ligand L11 proceeded in good yield and 
higher trans:cis selectivity (65%, 6.3:1; Supplementary Table 2). 
Reactions in acetonitrile were faster, but occurred with lower select- 
ivity. Reactions in a mixture of EtOAc and water (5:1) occurred simi- 
larly to those in pure EtOAc (63%, 6:1). 

The azidation of the C-H bond of a series of hydrocarbons occurred 
in excellent yields with high selectivity for a tertiary C-H bond over the 
secondary and primary C-H bonds (Supplementary Table 3), setting the 
stage for azidation of the tertiary C-H bonds in molecules containing 
a series of functional groups. The azidation reaction with derivatives of 
dihydrocitronellol containing two electronically distinct tertiary C-H 
bonds and many secondary C-H bonds is shown in Fig. 2a. These reac- 
tions revealed the inherent electronic selectivity of the azidation reaction 
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a_ Evaluation of site-selectivity for C-H bond azidation 
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b C-H bond azidation complex molecular scaffolds containing multiple tertiary centres 
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Cc Late-stage C-H bond azidation tetrahydrogibberellic acid derivative 
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Figure 2 | Evaluation of the effect of the steric and electronic environment 
on site selectivity for the azidation of aliphatic tertiary C-H bonds. Top row, 
reaction studied. *Conditions: 10.0 mol% Fe(OAc),, 11.0 mol% ligand 

L11, substrate (on left: 0.2 mmol, 1.0 equiv.) and 1 (0.6 mmol, 3.0 equiv.), 23°C, 
1 ml of CH;CN. Isolated yields of major azide products are reported unless 
mentioned otherwise. The ratios of isomers were determined by gas 
chromatography analysis with dodecane as internal standard, and are not 
corrected for response factors of minor isomers. a, Evaluation of site selectivity 
for C-H bond azidation. The ratios reported reflect the site selectivity for 
reaction at the two tertiary C-H bonds (3k to 3r) or the two benzylic C-H 
bonds (3s,3t). 3k-3t are the products formed from the azidation. {EtOAc was 


ace fh 


3b, 80% 
10%, 1:14 


3w, 53%, 10:1:1 
10%, 3:1* 


3z, 24%, 5:178 
5%, 1:14 


15%ill - J 


used as solvent. b, C-H bond azidation of complex molecular scaffolds 
containing multiple tertiary centres. Yields of products 3u-3z were also 
compared to those of the benzoyl-peroxide-initiated reaction, as follows. 
{Conditions: 10.0 mol% BzOOBz, substrate (0.2 mmol, 1.0 equiv.) and 1 

(0.6 mmol, 3.0 equiv.), 84°C and 1,2-dichloroethane (1.0 ml) as solvent. The 
yield and ratios of isomers were determined by gas chromatography analysis 
with dodecane as internal standard. The ratios refer to the major product versus 
minor product determined to be isomers by mass spectrometry. c, Late-stage 
C-H bond azidation of a tetrahydrogibberellic acid derivative. As reaction at 
top, except substrate is 5, and 6 is product. 8Diastereoselectivity was measured 
by 'H NMR of crude reaction mixture. || Unidentifiable mixture of products. 
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and its functional-group compatibility. The C-H bond azidation was 
selective for reaction at the more electron-rich, remote, tertiary C-H 
bond, resulting in good isolated yields of the pure major isomers formed 
by the reaction (3k-3q in Fig. 2a). The regioselectivity of azidation at the 
two electronically distinct tertiary C-H bonds was influenced by the 
distance of the electron-withdrawing group from the proximal tertiary 
C-H bond (3k and 3]). In these cases, the regioselectivity of the C-H 
bond azidation reaction mirrors the regioselectivity of a wide range of 
oxidation reactions”. Functional groups—such as an alcohol protected 
as an acetoxy group (3k and 31), a bromide (3m), a nitrile (3m), an ester 
(30), a carboxylic acid (3p) and an amide (3q)—were tolerated. Func- 
tional groups like a carboxylic acid (3p) and an amide (3q) that could act 
as directing groups influenced the selectivity by their electronic prop- 
erties, rather than by coordination to the catalyst. This higher reactivity 
of more electron-rich C-H bonds was also observed for cyclic structures. 
4-iso-Propylcyclohexanone underwent azidation with high regioselec- 
tivity for the more electron-rich of the two tertiary C-H bonds (3r in 
Fig. 2a). Investigation of the reactions of substituted arenes showed that 
tertiary and secondary benzylic C-H bonds were functionalized selectively 
in the presence of primary benzylic C-H bonds (3s and 3t in Fig. 2a). 

Having revealed high regioselectivity for C-H bond azidation, we 
assessed the potential of this reaction for azidation of the C-H bonds in 
more complex scaffolds containing several functional groups and strained 
rings that could react instead of a C-H bond or influence the identity of 
the C-H bond that undergoes azidation (Fig. 2b). Cyclic ketones pre- 
pared from (—)-carvone underwent azidation at the tertiary C-H bond 
remote from the ketone with high regioselectivity. These reactions occurred 
in the presence of epoxides, aziridines and cyclopropanes in good iso- 
lated yields (3u-3w in Fig. 2b). Minor azidation products were also 
observed by gas chromatography—mass spectrometry. These products 
were formed in amounts too small for isolation and were not character- 
ized. A mixture of diastereomers of «-dihydropinene (2b, 5:1) contain- 
ing three electronically similar, but sterically distinct, tertiary C-H bonds 
reacted to give 80% isolated yield of a single isomer of azide 3b at room 
temperature. The strained four-membered ring was tolerated, suggesting 
a fast recombination of the likely radical intermediates. Acetoxymenthol 
containing two electronically similar tertiary C-H bonds reacted pref- 
erentially at the iso-propyl side chain to provide one major constitutional 
isomer in moderate isolated yield (3x in Fig. 2b). This selectivity, pre- 
sumably, results from the greater conformational flexibility of the iso- 
propyl side chain. Isomeric products from azidation of a different C-H 
bond were observed as minor products; again, these products were 
formed in quantities too low for full characterization, but were shown 
to be isomers by mass spectrometry. 

Unlike the stereoretentive property of the metal-catalysed insertions 
of nitrenes or carbenes into C-H bonds*?”’, the configuration of the 


carbon bound to the azide is independent of the configuration in the 
reactant. However, this stereochemical outcome of our reactions allows 
one to use mixtures of diastereomeric reactants (see Supplementary 
Table 3 and Fig. 2; explicitly, 2b to form 3b and 5 to form 6) to provide 
one major diastereomer of the azide product (see below). 

Biologically active molecules containing multiple benzylic and tertiary 
C-H bonds also reacted selectively. Podocarpic acid and its derivatives 
have been reported to exhibit a wide variety of biological activities, 
including antileukaemic activity, inhibition of plant cell growth, and 
anti-inflammatory properties. A podocarpic acid derivative underwent 
selective azidation at the benzylic C-H bond in high yields and good 
diastereoselectivity (3y in Fig. 2b). Similar selectivity was also observed 
for the azidation of an oestrone (3z in Fig. 2b). 

The reaction of a gibberellic acid derivative illustrates the ability to 
conduct the azidation of complex structures (Fig. 2c). Gibberellic acid 
is a plant hormone that regulates growth and influences developmental 
processes, including cell elongation and germination. The gibberellic acid 
derivative 5 is a pentacyclic diterpene containing four tertiary C-H bonds. 
Based on the data just presented, the most electron-rich and sterically 
least hindered tertiary C-H bond, the one at carbon 8, should react 
selectively. In addition, the stereochemical outcome of the azidation of 
cis- and trans-decalin and o-dihydropinene (Supplementary Table 3 and 
Fig. 2b) suggested that the configuration of the reactive centre in sub- 
strate 5 would have a negligible influence on the diastereomeric ratio 
of product 6 (Fig. 2c). Indeed, the azidation of a mixture of diaster- 
eomers of 5 provided the corresponding azide 6 as a single isolated dia- 
stereoisomer in 75% yield (Fig. 2c) from exo-attack of the azide unit at 
C(8). 

Finally, we also tested functionalizations of the complex scaffolds 
shown in Fig. 2b and c by reactions initiated with benzoyl peroxide. In 
all cases, poor yields and selectivities were observed from the reactions 
initiated by the peroxide. The yields of the products from reaction of 
the substrates in Fig. 2b were low in all cases and formed mixtures of 
isomeric products with poor selectivity. In addition, gibberellic acid deri- 
vative 5 decomposed to form a complex mixture of products in the 
presence of azide 1 and the peroxide. This distinct reaction course in 
the presence and absence of the iron catalyst suggests that the C-N bond 
is formed by two different processes in the two systems, and underscores 
the importance of the iron catalyst to create a reaction that is suitable for 
late-stage functionalization of complex molecules. 

Although detailed mechanistic studies have not yet been conducted, 
several observations reveal the general features of the mechanism. The 
site selectivities and stereochemical outcome of the azidation of cis- and 
trans-decalin and o-dihydropinene strongly suggest that a tertiary alkyl 
radical is generated (Supplementary Table 3 and Fig. 2b). Attempts to 
use radical clocks to assess more directly a potential alkyl radical were 


Table 1 | Experiments to evaluate the involvement of radical intermediates and the role of the iron catalyst 


(Na oO tBu 
(Wor come) pen 
OO - : oo AY 
~ CHAN BON CN CN 
(0. 5 ) 
cis or trans 4, 2.0 equiv. dative ues BHT ‘Bu ABCN 
1.0 equiv. 
Entry Substrate Catalyst Temperature Additive Yield Selectivity 
CC) (%) 
1 cis Fe(OAc)2/L11 23 TEMPO* 3 NA 
2 cis Fe(OAc)2/L11 23 BHT* 3 NA 
3+ cis Fe(OAc)2/L11 80 NA 55 3.2 
4+ trans Fe(OAc)2/L11 80 NA 43 32 
5+ cis BzOOBz 80 ABCNt 40 1.7 
6+ trans BzOOBz 80 ABCNt 33 17 


Conditions: 10.0 mol% catalyst, cis- or trans-decalin (0.2 mmol, 1.0 equiv.) and 1 (0.4 mmol, 2.0 equiv.), 2 h. The yield and ratios of isomers were determined by gas chromatography analysis with dodecane as 


internal standard and not corrected for response factors of minor isomers. NA, not applicable. 
* 1.0 equiv. was added. 

+ EtOAc was used as solvent. 

£1.0 mol% was added. 
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Heterocyles by sequential 
C-H bond amination 
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Figure 3 | Introduction ofa series of nitrogen-containing functionalities via 
C-H bond azidation. a, Azide (1.0 equiv.), Fe cat., Fmoc-OSuc (1.5 equiv.), 
65°C, benzene, 24 h (see ref. 19 for details). b, BZCN (2 equiv.), 130°C, 48 h; 

c, CuSO, (10 mol%), alkyne (2 equiv.), DMF, 48 h; d, CuSO, (10 mol%), 


hampered by the poor reactivity of the appropriate substrates (see Sup- 
plementary Information for more details), but the proposed radical 
intermediate is consistent with the selectivity for azidation of the more 
electron-rich, less polarized, and thus weaker, tertiary C-H bonds 
(Fig. 2)””. Furthermore, addition of 1 equiv. of BHT and TEMPO (struc- 
tures shown in Table 1), which are known to quench radicals, resulted 
in complete inhibition of the azidation reaction (Table 1, entries 1 and 2). 
Finally, the kinetic isotope effect (KIE) for azidation of ethylbenzene 
and ethylbenzene-d}, in separate vessels from initial reaction rates was 
observed to be 5.0 + 0.3, implying that the cleavage of the C-H bond is 
the overall turnover-limiting step. 

To assess the role of the iron catalyst in this transformation, we com- 
pared the iron-catalysed azidations of the complex scaffolds in Fig. 2b 
and c with the reactions initiated by benzoyl peroxide. As noted above, 
poor yields were observed in all cases from the reactions initiated by the 
peroxide. The selectivities for formation of the azidation product from 
these reactions were lower than those of the iron-catalysed reactions. 
In addition, the diastereomeric ratio of product 3a formed from dec- 
alin and azide 1 in the presence of an organic radical initiator was dif- 
ferent from that formed from the iron-catalysed reaction conducted at 
the 80°C required for the peroxide-initiated process (Table 1, entries 
3-6). These differences in selectivities are all consistent with a different 
species forming the C-N bond during the iron-catalysed reaction and 
during the radical-initiated process. One possible origin of this differ- 
ence is formation of the C-N bond in the iron-catalysed process by 
reaction of an alkyl radical with an iron azide intermediate. 

This C-H bond azidation creates access to a range of synthetically 
useful functionalities attached to the original substrate by a C-N bond 
(Fig. 3)'°°. The primary amine formed from azides 3a and 3z containing 
a fully or partially substituted carbon atom (4a and 10), and heterocycles 
such as tetrazole 9 from azide 3f, form in good yields (Supplementary 
Table 3)’. The azides (for example, 3d and 3e, Supplementary Table 3) 
also undergo intramolecular cyclization under conditions reported re- 
cently’’ to form various heterocycles, creating a route to nitrogen hetero- 
cycles, such as 11, from alkanes by two C-H bond amination reactions. 
Finally, the azide functionality undergoes Huisgen cycloaddition reac- 
tions. For example, an alkyne tethered to a fluorescent tag coupled 
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with azido gibberellic acid derivative 6, and an alkyne attached to 
biotin coupled with azido podocarpic acid derivative 3y. These reactions 
illustrate how azidation and cycloaddition can create bioconjugation 
methods for visualization and identification of cellular targets of biologic- 
ally active natural products”. 

Much development of this C-H bond functionalization method re- 
mains to be accomplished, but a wide range of applications and exten- 
sions of the azidation reaction can be envisioned. The modularity of the 
catalyst creates further opportunities for site selectivity, and the stereo- 
chemical content of the ligand creates the potential for enantioselective 
azidation. The cycloadditions of azides could make possible conjugation 
to antibodies, and the simple reduction of the azide and the tolerance 
of the reaction to water creates the potential to intercept biosynthetic 
sequences and install an amino group in place of a hydroxyl group in the 
final stages. Finally, we anticipate that this process will spur development 
of new classes of catalysts for the azidation of C-H bonds that could 
proceed by distinct mechanisms with distinct selectivities for primary, 
secondary and tertiary C-H bonds. As rhodium-catalysed amination 
reactions develop further, the two classes of systems for C-H bond 
amination should begin to provide a set of tools for incorporation of 
nitrogen atoms that parallels the existing set of tools for the chemical 
and enzymatic oxidation of C-H bonds. 
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Effects of electron correlations on transport 
properties of iron at Earth’s core conditions 


Peng Zhang’, R. E. Cohen’? & K. Haule? 


Earth’s magnetic field has been thought to arise from thermal con- 
vection of molten iron alloy in the outer core, but recent density 
functional theory calculations have suggested that the conductivity 
of iron is too high to support thermal convection’, resulting in the 
investigation of chemically driven convection**. These calculations 
for resistivity were based on electron-phonon scattering. Here we 
apply self-consistent density functional theory plus dynamical mean- 
field theory (DFT + DMFT)’ to iron and find that at high tempera- 
tures electron-electron scattering is comparable to the electron-phonon 
scattering, bringing theory into agreement with experiments and 
solving the transport problem in Earth’s core. The conventional ther- 
mal dynamo picture is safe. We find that electron-electron scattering 
of delectrons is important at high temperatures in transition metals, 
in contrast to textbook analyses since Mott®*’, and that 4s electron 
contributions to transport are negligible, in contrast to numerous 
models used for over fifty years. The DFT+DMFT method should 
be applicable to other high-temperature systems where electron cor- 
relations are important. 

Recent DFT calculations by Pozzo et al.’ predict the electrical resis- 
tivity of iron to be (6.3-7.5) x 10° Qcmat temperatures from 4,580 K 
to 6,400 K and pressures from 120 GPa to 340 GPa. The thermal con- 
ductivities they predicted are approximately three times the currently 
used values of 46-63 W m_' K ' in geophysics"”. The results of Pozzo 
et al.* are consistent with previous DFT studies'’**"’. The large electrical 
and thermal conductivities, however, challenge current Earth models. 

Efforts to constrain the transport properties of iron at core condi- 
tions have a long history. Elsasser estimated the resistivity of iron to be 
p ~ 10.0 X 10 °Qcm at core conditions on the basis of geophysical 
arguments'*. By assuming the resistivity of iron to be constant along 
the melting line, Stacey and Anderson obtained p = 11.2 X 10 °Qcm 
at 4,971 K and 330 GPa (ref. 10). 

All previous calculations neglect electron-electron scattering. It has 
long been believed that resistivity in ordinary metals arises primarily 
from electron-phonon scattering, except at cryogenic conditions’. Cal- 
culations of resistivity from electron—electron scattering only now have 
become possible owing to developments in computational theory and 
technology and access to large-scale computational resources. The DFT 
+ DMFT approach has proved successful in providing results that are 
in good agreement with experiments for iron-bearing compounds’*"* 
and other strongly correlated materials. It quantitatively predicts prop- 
erties such as magnetic moments and the effective mass of a series of 
compounds in iron pnictides and iron chalcogenides. It also explains 
why superconducting gaps in these compounds are strongly Fermi- 
surface dependent. 

Our primary interest is in the properties of Earth’s core, so we first 
present resistivities at the core density of iron (throughout we refer to 
Earth’s core density from seismology of 13.04gcm *, oranatomic vol- 
ume of 47.8 atomic units = 7.083 A”) (Fig. 1). The resistivities calculated 
by Sha and Cohen’, de Koker et al.” and Pozzo et al.’ at the core con- 
ditions are approximately half the value obtained by extrapolating from 
the systematics of Stacey and Anderson” and half the value obtained by 


extrapolating from previous shock compression experimental results'*"”. 


The thermal conductivity k of pure iron at core conditions obtained from 
their’ calculations ranges from 150 W m_'K ' toabout250Wm 'K'. 
Assuming a large thermal conductivity, the calculated heat conduction 
down the core adiabat is about 15 terawatts (TW)’, which overlaps the 
estimates’*”” of total heat loss from the core of 8-16 TW. No energy is 
left to drive the thermal convection in the geodynamo. To sustain the 
geodynamo, compositional convection is therefore required**. However, 
this mechanism leads to a new paradox:° Earth’s inner core solidification 
is believed to have started about one billion years ago’, so before that 
there would be no compositional convection to drive the dynamo, yet 
we know that Earth’s geodynamo has existed for more than 3.4 billion 
years”. 

We find that at high temperatures the resistivity from electron- 
electron scattering, .--, computed with DFT + DMFT is of the same 
order as the electron-phonon scattering, Peps computed with DFT (Fig. 1)". 
The sum of the two parts of the resistivity, from the electron-electron 
and the electron-phonon scattering, is in agreement with earlier geo- 
physical estimates. After including both the electron-electron scatter- 
ing and the electron-phonon scattering, traditional resistivity values 
are recovered. We checked the systematics of Stacey and Anderson using 
resistivity results at other density as well. Considering the uncertainty 
of iron’s melting temperature (~+500K) the resistivity of iron is 
around 13.5 X 10 °Qcm along its melting line. Our results support 
Stacey and Anderson’s systematics. 
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Figure 1 | Resistivity versus temperature of hcp iron at Earth’s core density. 
The black vertical line indicates Earth’s core temperature*”*'. The black squares 
are the extrapolations to this density using the systematics of Stacey and 
Anderson”’. The green diamond is an interpolation to this density of previous 
shock compression results'*’’. The DFPT resistivity line is from the linear 
extrapolation of low-temperature results'**. The DFT + molecular dynamics 
(MD) resistivities are extracted from refs 2,3. The statistical error bars of 
DFT + DMFT and the total resistivities are smaller than their symbols. Values 
are given in the Extended Data Tables 1 and 2. All error bars are lo. 
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Figure 2 | Our computed resistivities of hcp iron are compared with 
experimental results. a, Resistivity versus temperature along the Hugoniot 
from shock data'*"'’, electron-phonon scattering from DFPT calculations’ 
(violet), and electron-phonon scattering (DFPT) plus electron-electron 
scattering (DFT + DMFT) (red). The blue line is the linear fit of shock 
compression data, and the black lines are the 95% mean confidence interval. 
b, Resistivity versus pressure at T = 300 K and at P = 65 GPa, T = 383K. 
Previous diamond anvil cell experimental results**"*? are compared with the 
DFPT calculations of ref. 1 (violet dashed lines) and ref. 4 (orange dashed lines). 
The data of refs 21,22 were analysed in ref. 1. Red plus symbols indicate the 
sum of DFPT electron-phonon’ and our DFT + DMFT electron-electron 
resistivities. The statistical 1o error bars of total resistivities are smaller than 
their symbols. Values are given in Extended Data Tables 1 and 2. 


Direct comparisons of our results with shock compression and dia- 
mond anvil cell experimental resistivities are provided in Fig. 2. Along 
the Hugoniot, we find that our electron-phonon plus electron-electron 
resistivities are linear at high temperatures (Fig. 2a). So we fitted the 
shock data along the Hugoniot to a straight line as well, and also derived 
the 95% mean confidence interval from the data. It is not justified to fit 
a higher-order function to the data given the experimental scatter. The 
total resistivity Pep + Pee, which is the summation of the density func- 
tional perturbation theory (DFPT) and DFT + DMFT results, nicely 
overlaps the best-fitting line of the shock data. In contrast, the Pep line 
falls below the confidence interval, showing that it is a poor model to 
explain the results. We note that Keeler et al.’s'*'° error bars are too 
small, both from the scatter of their own data, and from Bi et al.’”, 
which provided no error estimates. Furthermore, Bi et al. suggested that 
Keeler et al.’s values were systematically low, owing to shunting of the 
current. Our total resistivities pee + Pep agree well within experimental 
error with diamond anvil cell experimental results at room temper- 
ature**?”’, and also at the P = 65 GPa, T = 383 K point of ref. 4. Our 
results are in slightly better agreement with Seagle et al.” than with 
Gomi et al.*, but this difference probably represents the experimental 
uncertainty, since both are state-of-the-art experiments. At room tem- 
perature the resistivity from the electron—electron scattering is insigni- 
ficant relative to that from the electron-phonon scattering. We expect 
resistivity contributions also from defects and grain boundaries, and the 
DFT electron-phonon values do not include contributions from antifer- 
romagnetic correlations”*”, which are expected to be important at mod- 
erate to low temperatures. We find that the temperature dependence 
of the resistivity is much more important than changes with pressure. 

When the mean-free path is comparable to the lattice spacings, 
saturation in resistivity is expected at the loffe-Regel value for the 
electron-phonon component. We estimate the saturation resistivity to 
be 11.4 X 10° ° Qcmat the core density, which is higher than our esti- 
mated resistivity for electron-phonon scattering at pep = 9.15 X 10 °Qem 
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at 6,000 K. Since the resistivity from electron-electron scattering may 
exceed the Ioffe—Regel value”®, we do not expect saturation effects to be 
important at the core conditions. 

We estimate the thermal conductivity using the Wiedemann-Franz 
law (k = LT/p, with Lorentz parameter L = 2.44 x 10 °wQ K ”), giv- 
ing about 105 Wm !'K ‘at temperatures from 4,000 K to 7,000 K. 
Earth’s core is not pure crystalline iron but is liquid and contains light 
elements of the order of 10% by mass. Since the light elements will de- 
crease the electrical and thermal conductivities, this thermal conduc- 
tivity is close to previously accepted values’®. Furthermore, resistivity 
increases with melting”, so that there is now no problem driving the 
dynamo with thermal convection. Although the absolute values of the 
core resistivity and thermal conductivity cannot be constrained exactly 
owing to uncertainties in temperature and composition, it is clear (1) that 
electron-electron scattering is an important component, and (2) that 
including electron—electron scattering removes any problem with core 
conductivity being too high to explain the geodynamo. Thus, the trans- 
port crisis is solved. 

Contrary to general belief that at high temperatures the resistivity of 
transition metals comes mainly from electron-phonon scattering’, our 
DFT + DMFT computations have shown that the electron-electron 
scattering is as important as the electron-phonon scattering in hexagonal 
close packed (hcp) iron. According to the Fermi-liquid theory, at low 
temperature T the resistivity of metals from electron-electron scatter- 
ing P-e(T) is proportional to T°. Mott suggested that the T’ behaviour 
would have a broad crossover region before saturating, but gave no 
theory for the form, nor has one yet been developed®. Above 2,000 K, 
we find p,.(T) in Fig. 1 to be linear with temperature at constant volume. 
Interestingly, the linear T dependence of resistivity is widely observed 
in correlated materials, including high-temperature superconducting 
cuprates’®’, heavy Fermion and other correlated metals**”. In DMFT 
simulations of the Hubbard model’®”, linear-T resistivity arises from 
the linear-T dependence of quasiparticle weight at temperatures above 
the Fermi-liquid coherent energy scale. However, in hcp iron we find 
that the quasiparticle weight is only weakly dependent on temperature. 
As shown in Fig. 3a, the conduction electron scattering rate J” is linear 
with T above 2,000 K. 
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Figure 3 | Scattering rates, density of states and spectral function at Earth’s 
core density of hcp iron. a, Orbitally resolved scattering rates as a function 
of temperature. I7y4,,a,_.,, and I'y,.,. represent the scattering rates on 
respective d orbitals. b, Density of states of s, da, dy2—y2 4.xy> Axe+yz and total 
orbitals at 6,000 K. ¢, Spectral function A(k, E), where k is the wave vector and 
E is the electron energy relative to the Fermi level, at 6,000 K. The x axis is 
the k-path in the first Brillouin zone of the hcp lattice. The statistical lo error 
bars of the scattering rates are smaller than their symbols. 
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Since the density of electrons at the Fermi level, N,(Eg), is very small 
(Fig. 3b) and in our calculations N,(Ep)/[N,(Ep) + Na(Er)] < 1% (where 
Ey is the Fermi energy and N, is the partial density of states) at all 
temperatures, p.-(T) is determined mostly by the scattering rates of 
d-electrons in hcp iron. We suggest that the linear-T scattering in iron 
arises from scattering off thermally excited local states which originate 
from strong electron-electron interactions, a process not included in 
Fermi-liquid theory. In contrast, the linear resistivity from electron- 
phonon scattering comes from the near-linear dependence on number 
of phonons (the quantized lattice vibrations) with temperature. 

We find correlated bands in the low-energy region, all being iron 3d 
states (Fig. 3b, 3c). The correlated states at the Fermi level are the origin 
of large electron-electron scattering and substantial electron—electron 
resistivity. We expect some other transition metals to have incoherent 
states at Ep and to show similar behaviour, and those with sharp quasi- 
particle states at Ep to have normal behaviour with dominant electron- 
phonon resistivity. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 
Code availability. The DFT + DMFT code was developed by K.H. and is available 
at http://hauleweb.rutgers.edu/downloads/. 

The DFT+DMFT formalism. In the DFT + DMFT method**** a functional (equa- 
tion 118 of ref. 33) that includes all local two particle irreducible skeleton diagrams 
is optimized. The interaction Hamiltonian is given by the Slater form (equation 28 
of ref. 7) with the Slater integral F° = U, F’ = (14/1.625) J, and F* = (8.75/1.625) J, 
where U is the Hubbard parameter and J is the Hund’s coupling. The double 
counting energy Eq. is calculated from the fully localized limit** formula 
Eac= u(r 3) J (1-1) and n°, 


cor 2 2 \“cor cor 


the correlated atom (iron in this case). We also tested the around-mean-field 
double-counting”* and found an increment of resistivity of up to 16% at core con- 
ditions, which is not significant. Our DFT calculations show nor = 6.6 at the core 
density of iron, so we choose n®,, = Int(n,o,) = 7, where Int means choosing the 
nearest integer number of no. We tested n°, = 6 and 8, but did not find large 
changes, and n°, = 7 gives the lowest resistivity. More details are given in ref. 7. In 
our calculation, we choose an energy window of +10 eV around the Fermi level E; 
for the projector. We use the continuous time quantum Monte Carlo method to 
sample all diagrams in the hybridization expansion, as described in detail in refs 37 
and 38. 

The all-electron LAPW (linearized augmented plane wave) WIEN2K code” is 
used for the DFT calculations, with the Wu-Cohen exchange-correlation potential”. 
A test at core conditions with the Perdew-Burke-Ernzerhof exchange-correlation 
functional did not change the resistivity much (only by 2.5%), indicating that our 
results are robust. The k-space summation is on a 12 X 12 X 12 grid using a mod- 
ified tetrahedron integration scheme”. The cut-off energy separating the core from 
valence states is —9.0 Ry. RintKmax is 9.0 (where Rit is the smallest atomic sphere 
radius and K,y, is the maximum number of wavevectors used) and the magnitude 
of the largest vector GMAX is 19.0. There is no spin-orbital coupling/splitting in our 
calculation. 

The DFT+ DMFT method iterates as follows: (1) the lattice problem is solved as 
in DFT, but with added self-energy for the correlated states (which is zero for the 
first iteration), which makes the problem non-Hermitian and frequency depend- 
ent. The eigenvalues, wavefunctions, charge density and potential are output. Step 1 
can be iterated as an inner loop. (2) The impurity levels E,,,,, and the hybridization 
function A(q@) between the lattice and the impurity are computed and input into 
the continuous time quantum Monte Carlo impurity solver to find the DMFT solu- 
tion. Step 2 can be iterated as an inner loop. (3) The self-energy and the electron 
density are updated; the new self-energy and electron density are inserted into step 
(1) and (2) for the next DFT+DMFT iteration. 

For Earth’s core density of hcp iron we use 13.04 g cm7* and the corresponding 
lattice volume of 47.8 atomic units’. The pressure-temperature relationship at 
this density is presented in Extended Data Fig. 1. The lattice parameter ratio c/a is 
1.615 at all volumes”. We used a Hubbard U=5eV and Hund’s coupling J = 
0.943 eV on the basis of numerous previous studies of iron compounds. We checked 
that U = 2 eV gives very similar results; the resistivity is weakly dependent on U 
(the largest difference is smaller than 12%). Reducing J to zero halves the computed 
electron-electron resistivity, showing that the Hund’s coupling is quantitatively im- 
portant but not solely responsible for the scattering. Reducing U and J to near zero 
reduces the electronic resistivity to near zero, as expected. Without the electron- 
electron correlations, the electron-phonon interactions would dominate, as prev- 
iously believed. 

Accuracy of the DFT + DMFT method. The DFT + DMFT method combines 
the accurate treatment of the many-body physics as well as a fully self-consistent 
treatment of the crystal and atomic bonding and hybridization. The DFT + DMFT 
method is one of the most important advances in numerical simulation of con- 
densed matter physics. By introducing correlation effects, this method works in the 
region where DFT fails to predict experimental results. Its accuracy has been 
proved in research on various correlated electron materials from transition metals 
and their compounds to heavy fermion materials**’. At ambient conditions the 
transition-metal oxide FeO is an insulator but DFT predicts it to be a metal**. In 
contrast, DFT + DMFT not only makes FeO an insulator at ambient conditions, 
but also successfully predicted the existence of a metallic phase at high pressure*”. 
DFT + DMFT has also been used in research on heavy fermion materials. Using 
DFT + DMFT, Shim et al. identified the ground-state electronic configurations of 
curium and plutonium. They found that curium has a single-valence ground state 
with magnetic ordering, whereas plutonium has a ground state that comes from 
superposition of two atomic valences. The different magnetic properties of curium 
and plutonium are explained by the interplay between their ground-state electronic 
configurations, the electronic itinerancy and localization, as well as the spin-orbit 
coupling. The same group also investigated CelrIn; using DFT + DMFT, where 
they found the numerically calculated temperature resolved spectral functions to 


is the nominal electron occupancy of 


be in good agreement with experimental results. The consistency between numer- 
ical simulations and experimental results enabled them to explain the experiment- 
ally observed features in the optical conductivity. The accuracy of DFT + DMFT 
predictions are not limited to the single-particle level. In an inelastic neutron 
scattering experiment of the iron pnictide BaFe 9Nio ;As2 by ref. 50, their experi- 
mental data were compared with the dynamical magnetic susceptibility from 
DFT + DMFT calculations. They found systematic consistency between the exper- 
imental and the numerical results at different energy and moment slices. From these 
results, they confirm that magnetic excitations in the iron pnictide BaFe 9Nip As. 
are partially localized, which indicates the strongly correlated nature of this high- 
temperature superconducting material. 

Scattering rate and analytic continuation. In our DFT + DMFT calculations the 
scattering rate of d electrons at the Fermi level is given by: 


1%, = — Zz Imz"()| 00 (2) 


in which Z%, =[1—(0ReZ*(«)/d@)|~ ‘\oo0, & is the d-orbital index of dz, 
dy _ yxy and dyzy- and X(c) is the self-energy in real frequency from maximum 
entropy (MaxEnt) analytic continuation*' (Extended Data Figs 2 and 3). 

Another DFT + DMFT computation for hcp iron (atomic volume of 47.6 atomic 
units, hep lattice ratio c/a = 1.6, U = 3.37 eV, J = 0.93 eV, projection energy win- 
dow [—10.8 eV, 4.0 eV] around the Fermi level, around-mean-field double count- 
ing) is presented in ref. 52. We duplicate all of their results by exactly following their 
methods. In ref. 52 the scattering rate of hcp iron is calculated by extrapolating 
the imaginary frequency to zero ['(T, i0* ) = —Z(T, iwm,)Im[>> (T, i@n)| lio, +i0+- 
They claim that hep iron is in the Fermi-liquid state up to 5,800 K with a quadratic 
scattering rate in temperature. Although we agree with their results at room temper- 
ature, we find very different behaviour at high temperatures. Their extrapolation 
in imaginary frequency is poorly constrained at high temperatures. At Earth’s core 
temperature (T = 6,000 K), the first positive imaginary frequency is at w) = 1.62 eV, 
and there are only two points that could be used in extrapolation below 5 eV. This 
makes the self-energy, and consequently the scattering rate, at i0* depend heavily 
on the choice of extrapolation. In Extended Data Fig. 2, three methods (linear, cubic 
and Akima) are used to extrapolate the imaginary part of the self-energy to i0*. The 
results from the three extrapolations are distributed over a wide range. In Extended 
Data Fig. 2, the imaginary part of self-energy at i0* ranges from —0.23 eV to 
—0.12 eV, giving 100% uncertainty for the absolute values. Such uncertainty at 
high temperatures leads ref. 52 to conclude that hcp iron is a Fermi liquid at high 
temperatures, which is contrary to our results, owing to insufficient accuracy in 
their analysis. In contrast, the self-energy from our MaxEnt has a very dense mesh, 
where the smallest energy scale is 0.0025 eV, as shown in the inset of Extended 
Data Fig. 3. There is a w = 0 point, so extrapolation is no longer needed. 

Three independent methods were used to check the stability and accuracy of our 

analytic continuation; we used both the Padé and the singular value decomposi- 
tion methods in addition to MaxEnt. As presented in Extended Data Fig. 3 and its 
inset, the imaginary part of the self-energies from the three analytic continuations 
agree in the low-energy region needed for the conductivity. At energies around the 
Fermi level, the three analytic continuations give identical self-energies. This proves 
that our MaxEnt results are precise and stable. 
Optical conductivity calculation in DMFT. The optical conductivity is calculated 
using the self-energy on the real frequency axis from the MaxEnt, with the low 
frequency limit giving the direct-current conductivity. In DMFT, since the vertex 
corrections to conductivity can be safely omitted”, the formula we use in the opti- 
cal conductivity calculation is*° 


Reo,»(@) = me” > | : def =) Fp, [px(€) Vp. (e—o) ve] (3) 
k 3% a 
Gl(e)— 
where ju, v are direction indices, p,(€) = cae the velocity vector vie = 


i 

== Wile 
fle = (HX +1) ~? and the trace is over all valence states. 

Equation of state. Our resistivity results at core conditions are independent of any 
equation of state. We use the known density and the temperature of Earth’s core. 
For our comparisons with experiments under other conditions, we estimate the 
pressures of our DFT + DMFT calculations from the thermal equation of state 
given by ref. 31. The Hugoniot line is from the same paper*’. The pressure- 
temperature relationship at Earth’s core density and along the Hugoniot line are 
shown in Extended Data Fig. 1. 

Extrapolations. In Fig. 1 we estimated the resistivities of iron at Earth’s core den- 
sity using the systematics of ref. 10 as well as three sets of parameters (P, T,,) (where 
Tm is the melting temperature of iron) along iron’s melting curve*~*. Stacey and 


Wy i) and i,j are orbital indices, the fermionic distribution function 
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Anderson’? assume that the resistivity of iron (1) is constant along the melting line 
at 13.5 X 10° Qcm and (2) is a linear function of temperature at constant pres- 
sure. In Extended Data Table 1a the temperature T, at the core density corres- 
ponding to (P, T,,,) is derived from the equation of state in ref. 31 (see Extended 
Data Fig. 1). The resistivities at the core density are given by p- = PmT-/Tm, where 
Pm is the melting resistivity. We also tested the effectiveness of Stacey and Ander- 
son’s systematics’® using our resistivity data at the atomic volume of 45 atomic units 
and temperature 6,000 K. The derived melting resistivity is always close to 13.5 X 
10° Qcm. Interestingly, our calculations support the systematics of Stacey and 
Anderson”, in spite of the importance we find of electron-electron scattering, and 
their assumption that electron-phonon scattering would give scaling with the melt- 
ing curve. 

The shock compression experiment extrapolation point in Fig. 1 is derived from 
the formula: 


p(T) =1.58+2.59 x 10-7T (4) 
which is the best linear fit of resistivity data from previous shock experiments by 
Keeler et al.'*"6 and Bi et al."’. p(T) is in units of 10° Qcm. The pressure and tem- 
perature of this point come from the Hugoniot line at Earth’s core density, as given 
in Extended Data Fig. 1. 
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Extended Data Figure 1 | Pressure versus temperature relationship of hcp iron at Earth’s core density and along the Hugoniot line*'. The two lines cross at 


P= 269.9 GPa, T = 6,658 K at Earth’s core density. 
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Extended Data Figure 2 | Extrapolation of ImX(ia,) to zero imaginary and Akima spline. The imaginary part of self-energy at i0* ranges from 


frequency. The self-energy is from the d, orbital of hcp iron at Earth’s core —0.23 eV to —0.12eV. 
density and 6,000 K. Three extrapolation methods are used: linear, cubic spline 
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Extended Data Figure 3 | The imaginary part of self-energies in real part of self-energies in energy range [—0.01 eV, 0.01 eV] around the Fermi 
frequency on the d,2 orbital of hcp iron at Earth’s core density and 6,000 K. _level. The self-energies from three analytic continuation methods agree at the 
The self-energies are from three analytic continuation methods: MaxEnt, low-energy region. 


Padé and singular value decomposition. The inset shows the same imaginary 
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Extended Data Table 1 | Resistivities from extrapolations and previous experiments 


P (GPa) Tm (K) T. (K) p (10-° Q cm) 
235 5495 2148 5.28 
a 243 5572 3141 7.61 
260 5737 5361 12.62 
V (bohr? /atom) P (GPa) T (kK) p (107° Q em) + (107° Q cm) 
: 47.8 269.9 6658 18.83 3.07 
Source V (bohr? /atom) P (GPa) T (K) p (10~° 2 em) 
Cc de Koker et al. 47.8 264.9 6000 6.56 
Pozzo et al. 47.8 264 5865 6.65 
V (bohr? /atom) P (GPa) T (K) p (10-° Q em) + (10-° Q em) 
74.2 17 303 2.78 0.31 
67.0 37 585 3.19 0.31 
d 63.0 44.4 728 3.59 0.24 
61.7 64.3 1240 3.95 0.28 
57.2 110 2180 5.35 0.45 
55.4 140 2950 6.41 0.42 
V (bohr? /atom) P (GPa) T (K) p (10~° Q cm) 
57.9 101.1 2010 6.9 
: 55.1 146.7 3360 15.2 
52.3 208.0 5220 13.1 


a, The extrapolated resistivities in Fig. 1 at Earth’s core density using the systematics of ref. 10. b, The extrapolated resistivity in Fig. 1 at Earth’s core density on the Hugoniot. ¢, The resistivities from DFT + MD 
calculations in Fig. 1 at Earth’s core density, extracted from refs 2 and 3. d, The atomic volumes, pressures, temperatures and resistivities from shock compression experiments!*° in Fig. 2a. e, The atomic volumes, 
pressures, temperatures and resistivities from shock compression experiments by ref. 17 in Fig. 2a. 
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Extended Data Table 2 | The atomic volumes, pressures, temperatures and resistivities from our study in Fig. 1 and Fig. 2 


V (bohr? /atom) P (GPa) T (kK) Pep (10~° Q cm) Pee (10-° Q em) +pee (10-& Q cm) Pep + Pee (10-° 2 em) 
47.8 221.7 300 0.30 0.012 0.0025 0.312 
47.8 226.3 1000 1.39 0.14 0.028 1.53 
47.8 233.9 2000 2,94 0.55 0.10 3.49 
a 47.8 249.6 4000 6.05 2.68 0.48 8.73 
47.8 264.9 6000 9,15 4.75 0.81 13.90 
47.8 269.9 6658 10.18 5.48 1.10 15.66 
47.8 272.6 7000 10.71 5.87 1.17 16.58 
V (bohr? /atom) P (GPa) T (kK) Pep (10~° Q cm) Pee (107° 2 cm) +pee (10~® 2 cm) Pep + Pee (107° Q cm) 
63 44.4 728 2.77 0.38 0.057 3.15 
b 57.9 101 2010 5.65 1.65 0.31 7.30 
55.1 146.7 3360 6.98 3.28 0.66 10.26 
52.3 208 5220 8.84 4.85 0.87 13.69 
V (bohr? /atom) P (GPa) T (kK) Pep (10~° Q cm) Pee (107° Q cm) t+pee (107& Q cm) Pep + Pee (10~° Q cm) 
63 28.3 300 0.82 0.063 0.011 0.883 
< 60 50.8 300 0.54 0.042 0.0076 0.582 
56.5 84.1 300 0.39 0.024 0.0043 0.414 
55 100.0 300 0.37 0.021 0.0040 0.391 
58 65 383 0.59 0.054 0.011 0.644 


a, DFT+DMFT calculated resistivities in Fig. 1 at Earth’s core density. b, DFT + DMFT calculated resistivities in Fig. 2a, along the Hugoniot line. c, DFT + DMFT calculated resistivities in Fig. 2b, compared with DAC 
experimental results. pep is the DFPT-calculated resistivity by ref. 1. pee is the resistivity from our DFT + DMFT study. 
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Resolving the complexity of the human genome 
using single-molecule sequencing 


Mark J. P. Chaisson!, John Huddleston!?, Megan Y. Dennis', Peter H. Sudmant', Maika Malig', Fereydoun Hormozdiari', 
Francesca Antonacci’, Urvashi Surti*, Richard Sandstrom!, Matthew Boitano”, Jane M. Landolin®, John A. Stamatoyannopoulos', 


Michael W. Hunkapiller>, Jonas Korlach® & Evan E. Eichler’? 


The human genome is arguably the most complete mammalian 
reference assembly’ ’, yet more than 160 euchromatic gaps remain** 
and aspects of its structural variation remain poorly understood ten 
years after its completion’ ’. To identify missing sequence and gen- 
etic variation, here we sequence and analyse a haploid human genome 
(CHM1) using single-molecule, real-time DNA sequencing”’. We close 
or extend 55% of the remaining interstitial gaps in the human GRCh37 
reference genome—78% of which carried long runs of degenerate 
short tandem repeats, often several kilobases in length, embedded 
within (G+C)-rich genomic regions. We resolve the complete sequence 
of 26,079 euchromatic structural variants at the base-pair level, includ- 
ing inversions, complex insertions and long tracts of tandem repeats. 
Most have not been previously reported, with the greatest increases 
in sensitivity occurring for events less than 5 kilobases in size. Com- 
pared to the human reference, we find a significant insertional bias 
(3:1) in regions corresponding to complex insertions and long short 
tandem repeats. Our results suggest a greater complexity of the human 
genome in the form of variation of longer and more complex repet- 
itive DNA that can now be largely resolved with the application of 
this longer-read sequencing technology. 

Data generated by single-molecule, real-time (SMRT) sequencing 
technology differ drastically from most sequencing platforms because 
native DNA is sequenced without cloning or amplification, and read 
lengths typically exceed 5 kilobases (kb). Despite overall lower individual 
read accuracy (~85%), longer read length facilitates high confidence 
mapping across a greater percentage of the genome'"’”. We generated 
~40-fold sequence coverage from a human CHM1 hydatidiform mole 
using long-read SMRT sequence technology (average mapped read 
length = 5.8 kb; Supplementary Table 1). We selected a complete hyda- 
tidiform mole to sequence because it is haploid, lacking allelic variation, 
and provides higher effective sequence coverage. We aligned 93.8% of 
all sequence reads to the human reference genome (GRCh37) using a 
modified version of BLASR" (Supplementary Information) and gener- 
ated local assemblies of the mapped reads using Celera’? and Quiver”, 
the latter of which leverages estimates of insertion, deletion and substi- 
tution probabilities to determine consensus sequences accurately. We 
compared the consensus sequences of regions with previously sequenced 
and assembled large-insert bacterial artificial chromosome (BAC) clones 
generated from CHM I tert (ref. 15). The comparison shows a consensus 
sequencing concordance of >99.97% (phred quality = 37.5), with 72% 
of the errors confined to indels within homopolymer stretches (Sup- 
plementary Table 3). 

We initially assessed whether the mapped reads could facilitate clos- 
ure of any of the 164 interstitial euchromatic gaps within the human 
reference genome (GRCh37). We extended into gap regions using a 
reiterative map-and-assemble strategy, in which SMRT whole-genome 
sequencing (WGS) reads mapping to each edge of a gap were assembled 
into a new high-quality consensus, which, in turn, served as a template 


for recruiting additional sequence reads for assembly (Supplementary 
Information). Using this approach, we closed 50 gaps and extended into 
40 others (60 boundaries), adding 398 kb and 721 kb of novel sequence 
to the genome, respectively (Supplementary Table 4). The closed gaps 
in the human genome were enriched for simple repeats, long tandem 
repeats, and high (G+C) content (Fig. 1) but also included novel exons 
(Supplementary Table 20) and putative regulatory sequences based on 
DNase I hypersensitivity and chromatin immunoprecipitation followed 
by high-throughput DNA sequencing (ChIP-seq) analysis (Supplemen- 
tary Information). We identified a significant 15-fold enrichment of short 
tandem repeats (STRs) when compared toa random sample (P < 0.00001) 
(Fig. 1a). A total of 78% (39 out of 50) of the closed gap sequences were 
composed of 10% or more of STRs. The STRs were frequently embedded 
in longer, more complex, tandem arrays of degenerate repeats reach- 
ing up to 8,000 bp in length (Extended Data Fig. 1a—c), some of which 
bore resemblance to sequences known to be toxic to Escherichia coli’®. 
Because most human reference sequences’”"* have been derived from 
clones propagated in E. coli, it is perhaps not surprising that the appli- 
cation of a long-read sequence technology to uncloned DNA would 
resolve such gaps. Moreover, the length and complex degeneracy of these 
STRs embedded within (G+ C)-rich DNA probably thwarted efforts to 
follow up most of these by PCR amplification and sequencing. 

Next, we developed a computational pipeline (Extended Data Fig. 2) 
to characterize structural variation systematically (structural variation 
defined here as differences =50 bp in length, including deletions, dupli- 
cations, insertions and inversions’). Structural variants were discovered 
by mapping SMRT sequencing reads to the human reference genome" 
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Figure 1 | Sequence content of gap closures. a, Gap closures are enriched 
for simple repeats compared to equivalently sized regions randomly sampled 
from GRCh37. b, Human genome gaps typically consist of (G+C)-rich 
sequence (yellow) flanking complex (A+T)-rich STRs (green) (empirical 

P value; Supplementary Information). Red line indicates genomic (G+C) 
content. 
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Table 1 | Structural variation between CHM1 and GRCh37 


LETTER 


Insertion Deletion Ins/del 
Number Mean length Total bases Number Mean length Total bases Total events Total bases 
STR >10bp 6,007 295 1,771,948 2,986 90 268,075 2.01 6.61 
STR = 50 bp 4,289 398 1,706,524 1,530 139 212;957 2.80 8.01 
STR >10, < 50 bp 1,718 38 65,424 1,456 38 5,518 1.18 11.86 
Tandem repeat 2,760 303 836,474 2,398 182 4,361,598 LIS 0.19 
MEI 2,149 497 1,200,647 2,084 428 841,617 1.03 1.43 
AluY 859 302 259,810 859 302 259,220 1.00 1.00 
LINE/L1Hs 145 2,412 349,780 141 2,411 339,971 1.03 1.03 
SVA 457 369 168,762 382 274 104,589 1.20 1.61 
HERV 58 338 19,619 60 180 10,779 0.97 1.82 
Alu+STR/Alu+mosaic 287 413 118,486 186 262 46,905 1.54 2.53 
nactive 343 226 77,602 456 176 80,153 0.75 0.97 
Centromeric satellites 669 693 463,687 817 722 590,223 0.82 0.79 
HSAT 46 861 39,604 48 790 37,935 0.96 1.04 
ALR 622 681 423,453 769 718 552,288 0.81 O77 
Other 168 112 18,790 277 98 27,144 0.61 0:69 
Complex 1,115 1,927 2,148,642 317 2,066 654,834 3.52 3.28 
Unannotated 2,386 60 143,598 2013 62 143,559 1.03 1.00 
Total 17,851 398 7,112,381 11,819 ail. 3,208,633 151 2,22 
Euchromatic subtotal 15,776 390 6,149,335 10,303 248 2,559,644 1,53 2.40 
Euchromatic subtotal 9,638 542 5,237,445 6,111 358 2,189,837 1.58 2.39 
(=50 bp) 


The statistics of insertion and deletion events in CHM1 compared to GRCh37 are listed by sequence category. Low complexity sequence is divided between STRs and variable number tandem repeats 
(Supplementary Information). AluY, L1Hs, SVA and HERV are active mobile elements. Alu indel events in conjunction with STR sequences or mosaic Alu are considered separately from solitary AluY mobile element 
insertions (MEls). Inactive mobile element insertions include L1P and AluS. Rarely observed elements (<10) are combined as ‘Other’. Classes of structural variation showing an insertional bias (>2.5-fold excess in 


CHM1) are in bold. 


and searching for specific mapping signatures (Supplementary Infor- 
mation). At every variant locus, we recruited all uniquely mapping reads, 
created a local de novo assembly, defined breakpoints compared to the 
human reference, and classified each structural variant by type and pro- 
bable mechanism (Table 1). We identified a total of 26,079 insertions/ 
deletions =50 bp within the euchromatic portion of the genome. Almost 
all insertion and deletion breakpoints were resolved at the single-base- 
pair level, generating one of the most comprehensive catalogues of struc- 
tural variation (47,238 breakpoint positions). A total of 6,796 of the events 
map within 3,418 genes with a subset of events (169) corresponding to 
variation in the spliced transcripts of 140 genes (Supplementary Table 9). 
From all targeted sequencing experiments combined (Supplementary 
Information) we estimate an overall validation rate of 97%, of which only 
a fraction can be detected by application of Illumina next-generation 
sequencing. 

Of all copy number differences found, 85% were novel compared to 
previous studies of structural variation’*””, in large part owing to increased 
ascertainment of smaller variation (average length 497 bp). The effect 
was most pronounced for insertions in which 92% of all differences had 
not been previously reported, in contrast to deletions in which 69% of 
the events were novel (Fig. 2). When comparing the size distribution of 
insertions and deletions between the two haplotype references, we found 
that insertions within CHM 1 were longer and more abundant with 5,473 
additional insertion events when compared to the human reference 
(Table 1). This difference contributes to a significant insertional bias of 
3.9 megabases (Mb) of additional sequence either missing or expanded 
when compared to the human reference (Table 1). We find a substan- 
tial increase in the amount of long, =50 bp STR insertions relative to 
deletions (P < 2.2 X 10 ’°), including STRs within genes (Supplemen- 
tary Table 9). In addition to being 2.80 times more frequent than dele- 
tions, the STR insertions =50 bp are, on average, 2.87 times longer. This 
asymmetry becomes more pronounced with increasing STR insertion 
length (Fig. 2b). The genomic distribution of STR insertions is highly 
non-random being biased to the last 5 Mb of human chromosomes 
(Extended Data Fig. 3) correlating with recombination rate” (r = 0.21) 
and human-chimpanzee divergence (r° = 0.20). We note that 2,285 of 
these expanded STRs occur within genes, including 11 within an un- 
translated region (noting shorter insertions in FMRI and C9orf72, a 
common mutated locus for amyotrophic lateral sclerosis; Supplementary 
Information) and two within the coding sequence of genes (MUC2 and 


SAMD1). A total of 189 genes have an STR expansion >1 kb, representing 
potential sites of genomic instability (Supplementary Table 9). 

The remaining half of the insertional bias (~ 1.5 Mb) was accounted 
for by 1,116 more complex structural variants (which we define as inser- 
tions having either several annotated repeat elements, or at least 30% of 
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Figure 2 | Structural variation analyses. a, Histograms display the 
distribution of novel insertions (black/grey) and deletions (red/pink) 

between CHM1 and GRCh37 haplotypes compared to copy number variants 
identified from other studies for insertions and deletions less than 1 kb (left) 
and greater than or equal to 1 kb (right). Most of the increased sensitivity occurs 
below 5 kb. Peaks at ~300 bp and 6 kb correspond to Alu and L1 insertions, 
respectively. b, STR insertions in CHM1 (green) are longer than the human 
genome (blue; GRCh37), and this effect becomes more pronounced with 
increasing length (x axis). c, The percentage repeat composition (x axis) of 
1-kb sequences flanking insertion sites for Alu, LI and SVA mobile element 
insertions. Insertion calls from the 1000 Genomes Project (pink)”’ compared to 
calls from CHM1 using SMRT reads (blue) show increased sensitivity for 
repeat-rich insertions. 
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Figure 3 | CHMI clone-based assembly of the human 10q11 genomic 
region. The clone-based assembly is composed primarily of BACs from the 
CH17 library as shown in the tiling path below the internal repeat structure of 


the remaining sequence not annotated as repeat) (Table 1 and Extended 
Data Fig. 4). Sequence analyses of these regions of the genome revealed 
these insertions were frequently embedded within regions already enriched 
for clusters of mobile element insertions. Complex repetitive regions 
such as these represent a major challenge in structural variant detection 
owing to spurious mapping of short-read sequence data. We performed 
site complexity analysis of annotated mobile element insertion loci 
by assessing the repeat composition of the 1-kb sequences 5’ and 3’ 
flanking the retrotransposons AluY, L1 and SVA insertions in both the 
CHM1 sequencing data and insertion sites from population-scale low- 
coverage sequencing data’'. While we observed a small bias in the re- 
peat complexity of AluY insertions (53% versus 48%; P = 4.8 X 10°, 
Kolmogorov-Smirnov test), a much more marked shift is seen for L1 
and SVA insertions. We found that human-specific L1Hs insertion sites 
in CHM1 have a flanking common repeat content of 59% when com- 
pared to 39% in the 1000 Genomes Project data set (P = 1.8 X 100" 
Kolmogorov-Smirnov test) (Fig. 2c). The bias for SVA insertions is even 
greater, with 76% of insertions mapping adjacent to repeats when 
compared to 50% using Illumina read-pair data (P = 3.84 X 10 oe 
Kolmogorov-Smirnoy test). 

The large STR and complex insertions are enriched for regions anno- 
tated as having potential clone assembly problems. This enrichment 
becomes more pronounced the larger and more complex the insertion 
(for example, the 185-fold enrichment of ‘black tag’ annotations for 
STR insertions; Supplementary Information). Notably, less than 1% of 
these variants are present in newer assemblies of the human genome, 
including GRCh38 and CHM1.1 (ref. 22) (derived primarily by Illumina 
sequencing technology). Because we find evidence of most of these 
complex events in additional human or chimpanzee genomes (Sup- 
plementary Information), we propose that ~1,700 sites (3.5 Mb) rep- 
resent deficiencies or ‘muted’ gaps that can now be accessed as a result 
of SMRT technology (Supplementary Table 7). We incorporated these 
inserted sequences as well as gap closures into a patched GRCh37 ref- 
erence, effectively mapping 0.026% additional Illumina reads and dis- 
covering additional single nucleotide polymorphisms (SNPs) (for example, 
9,231 SNPs; Supplementary Information). 

In addition to insertions and deletions, we also searched for the pres- 
ence of inversions—a structural variation class that is notoriously dif- 
ficult to ascertain. We developed a search algorithm that specifically 
leveraged the increased length of the SMRT sequence reads to search 
for ‘reversals’ in order when aligned to the reference. Regions with two 
or more reversals were then locally assembled to define the breakpoints 
of each event optimally. We identified 34 inversions with an average 
length of 7.1 kb, corresponding to a total of ~240 kb of inverted sequence 
(Supplementary Table 8 and Supplementary Fig. 6). We subcloned and 
sequenced 15 events using a large-insert BAC library with a validation 
rate of 100% (15 out of 15) (Extended Data Fig. 5). None of the events 
disrupted genes, no enrichment was observed on the X chromosome, 
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the region. Coloured arrows indicate large segmental duplications (SDs) with 
homologous sequences connected by lines generated by Miropeats”’. 


and 68% (23 out of 34) of the inversions were flanked by inverted 
repeats (Supplementary Table 8). 

A limitation of our approach is its dependence on the local assembly 
of mapped reads to the human reference genome. Even with an average 
mapped read length of 5.8 kb, not all reads may be uniquely mapped to 
a specific location. As a result, gaps (n = 82) adjacent to segmental dup- 
lications were largely unresolved, inversions exceeding the read length 
(>20kb) could not be detected (for example, 15q13.3 region), and 
SMRT sequence read synthesis within or flanking long, highly identical 
repeats could not be reliably assembled. We identified a total of 737 
euchromatic regions (12.5 Mb) of our genome, in which large-scale map- 
ping inconsistences (n = 22) or deficiencies (n = 715) were noted but 
were unresolvable by this approach (Supplementary Tables 26 and 27). 
We selected one 6.5-Mb region mapping to chromosome 10q11.23 
for a more detailed analysis. The region carried seven gaps within the 
human reference genome (GRCh37), none of which was resolved or 
extended by SMRT WGS reads. We applied an alternative clone-based 
hierarchical approach (Supplementary Information) and identified a 
tiling path of 32 BACs and assembled the clone inserts using SMRT 
sequencing", We generated sequence contigs spanning two large clusters 
of segmental duplication (2.7 and 1.2 Mb), closing six of the seven gaps 
in this region (Fig. 3 and Extended Data Fig. 6), adding 416 kb of miss- 
ing reference sequence, correcting the orientation of 1,451 kb, and elim- 
inating 856 kb of redundant sequence that was represented twice within 
the reference. Two gaps remain, each at the same location within para- 
logous segmental duplications, corresponding to a nearly perfect 50-kb 
tandem repeat that cannot be resolved at the level of large-insert clones 
using existing methods. These results indicate that although it is possi- 
ble to use reads to close gaps and detect variation missed by other next- 
generation sequencing methods, the resolution of larger, complex regions 
of the genome still require targeted efforts that leverage both clones and 
WGS data. Complete de novo assembly of human genomes will probably 
require the development of even longer-range sequencing data. The 
approaches outlined here will have broader application to many of the 
unfinished and complex regions of mammalian genomes. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


SMRT WGS data (41-fold sequence coverage) was generated using a Pacific Bio- 
sciences RSIJ instrument (P5C3 chemistry) from genomic libraries generated from 
a complete hydatidiform mole DNA (CHM tert). Sequence reads were mapped to 
the human reference genome (GRCh37) using a modified version of BLASR (http:// 
www.github.com/EichlerLab/blasr) (Supplementary Methods); a bioinformatics 
pipeline was developed to identify regions of structural variation and extensions into 
gaps (http://www.github.com/EichlerLab/chm1_scripts); corresponding sequence 
reads were de novo assembled and a high-quality consensus sequence generated for 
each region using Celera v.8.1 (ref. 13) and Quiver v.0.7.6 (ref. 14). Reads are selected 
for support of a variant if the mapping quality is greater than 20; a minimum of 
5 reads are required to trigger an assembly. For the purpose of this analysis, we focused 
only on the euchromatic portion of the genome excluding pericentromeric regions 
(5 Mb flanking annotated centromeres), all acrocentric portions of chromosomes, 
and subtelomeric regions (150 kb from the annotated telomeric sequence). Repeat 
content ofall structural variants was determined using CENSOR™, RepeatMasker”, 
Miropeats”’ and TRE (http://tandem.bu.edu/). The sequence accuracy of the assemblies 


and structural variant polymorphisms were inferred by comparison to 18 sequenced 
large-insert BAC (CH17) and 89 fosmid clones®, Sanger-based BAC-end sequence 
generated for CHMItert (GenBank accessions in Supplementary Table 35), and 
comparison to I]lumina-based WGS generated for human genomes’. We also gen- 
erated Illumina WGS data (41-fold) for comparison (SRA SRP044331). For the 
chromosome 10q11 region, 125 CH17 BACs were identified and sequenced using a 
Nextera-Illumina protocol’®. A minimal tiling path of 35 clones was deeply sequenced 
(300-fold coverage) using 1 SMRT cell per clone; inserts were assembled and an 
alternative reference was created using methods described previously”. 
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Extended Data Figure 1 | Sequence content of gap closures. a-c, Gap 
closures are enriched for simple repeats compared to equivalently sized 
regions randomly sampled from GRCh37; examples of the organization 
of these regions are shown using Miropeats for chromosome 4 
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between blocks are indicated by colour. 
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Extended Data Figure 2 | Variant detection pipeline. At every variant locus, 
we collected the full-length reads that overlap the locus, performed de novo 
assembly using the Celera assembler, and called a consensus using Quiver after 
remapping reads used in the assembly as well as reads flanking the assembly 
(yellow reads) to increase consensus quality at the boundaries of the assembly. 
BLASR is used to align the assembly consensus sequences to the reference, 
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Reads spanning a deletion event within a single alignment are shown as bars 
connected by a solid line, and double hard-stop reads spanning a larger deletion 


event and split into two separate alignments of the same read are shown as a 
dotted line. 
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Extended Data Figure 5 | Inversion validation by BAC-insert sequencing. _ validated by aligning the corresponding BAC sequences to GRCh37 with 
Inversions detected by alignment of single long reads were validated by Miropeats. Shared sequence between the BACs and GRCh37 is shown in black; 
sequencing clones from the CHM1 BAC library (CHORI17), in which end inversion events are indicated in red. 

mappings to GRCh37 spanned the putative inversions. Inversions were 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Unique sequence (1.4 Mb) 


> 


CH17-147J3 _CH17-064H14 CHI7-224H23 CHI7-24J18 CHI7-177L7 CHI7-214E8, _CH17-360D5 - CHI7-448F9 CHI7-334L6 CH17-476B22 
=> EE 
CH17-359E7 CH17-313024 CH17-412E16 CH17-183B22 CH17-384K12__, CH17-24216 CH17-183A9 CH17-09106 CH17-404119 CH17-177M15 CH17-012K8 
== a ===> ==> === aay 
CHI7-214P16 CH17-319J8 CH17-33588 CH17-354121 CH17-406119 CH17-463E3 
aaa aa —==p 
CH17-306A14 CH17-096024 CH17-149018 CHI7-15315 
———> gap a Ea ye 
as 
ZFAND4 |HHI AGAP4 gy NCOA4g ANTXRLP1|-4, NPY4R || ZNF37BP }4 FRMPD2P1 qi GDF10 4 ANXA8 Bi SYT15 qq FAM25C | FRMPD2 HHI TIMM23BHH = ASAH2 mH ASAH2B 
FAM21C Hg TIMM23 1) ANTXRL jg-|  GPRIN2 ) GLUD1P7 jp) GDF2| BMS1P6j  GPRIN2 | BOGDHL HAGAPG = SGMS1 HY--—-}-4p-----4-] A1CF BIL 
PARGP1 |b] AGAP7 gy =RSU1P2| SYTI5q  BMS1g RBP3g  CTSLP2y LOC100996758 | FAM21EP |} 
HNRNPASP1| MSNB jj FAM25BP| FAM35BP iH PTPN20 4-H ZNF488 |} LOC101929397 }4——4 PARG HHHHl Hmm FA21A 
TMEM72-AS1 | ANXABL1 gg LOC441666 }-4—}—| FAM25C | CTGLF12P Hl ZNF485 | 
LINC00842 |—4 AGAPS H TIMM23B HH 


HNRNPA1P33 | 


Contig 1 (2.7 Mb) Contig 2 (1.2 Mb) 


ST 


ah 1 ANT Wilh | | iil ~SegDupMasker 
Contig 1 


Contig 2 


GRCh37 
Se TT Bown Wi 1) | | {iilif! ~SegDupMasker 


0 500 1000 1500 2000 2500 3000 3500kb 0 500 1000 1500 kb 


Extended Data Figure 6 | CHM1 clone-based assembly of the human 10q11 —_ annotated from alignment of RefSeq messenger RNA sequences with GMAP” 
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Interception of host angiogenic signalling limits 


mycobacterial growth 


Stefan H. Oehlers', Mark R. Cronan!, Ninecia R. Scott', Monica I. 


Rebecca W. Beerman!, Philip S. Crosier? & David M. Tobin! 


Pathogenic mycobacteria induce the formation of complex cellular 
aggregates called granulomas that are the hallmark of tuberculosis'”. 
Here we examine the development and consequences of vasculariza- 
tion of the tuberculous granuloma in the zebrafish- Mycobacterium 
marinum infection model, which is characterized by organized gran- 
ulomas with necrotic cores that bear striking resemblance to those of 
human tuberculosis”. Using intravital microscopy in the transparent 
larval zebrafish, we show that granuloma formation is intimately 
associated with angiogenesis. The initiation of angiogenesis in turn 
coincides with the generation of local hypoxia and transcriptional 
induction of the canonical pro-angiogenic molecule Vegfaa. Phar- 
macological inhibition of the Vegf pathway suppresses granuloma- 
associated angiogenesis, reduces infection burden and limits dis- 
semination. Moreover, anti-angiogenic therapies synergize with the 
first-line anti-tubercular antibiotic rifampicin, as well as with the an- 
tibiotic metronidazole, which targets hypoxic bacterial populations’. 
Our data indicate that mycobacteria induce granuloma-associated 
angiogenesis, which promotes mycobacterial growth and increases 
spread of infection to new tissue sites. We propose the use of anti- 
angiogenic agents, now being used in cancer regimens, as a host-targeting 
tuberculosis therapy, particularly in extensively drug-resistant dis- 
ease for which current antibiotic regimens are largely ineffective. 

The human tuberculous granuloma, a tightly cohesive cellular struc- 
ture that houses infecting mycobacteria, develops hypoxic areas around 
its necrotic core*. In tumours, the development of hypoxia is tightly linked 
to angiogenesis and subsequent metastasis”. In tuberculosis, attention has 
focused on the possible consequences of granuloma hypoxia to bacterial 
physiology’, but relatively little attention has been paid to the functional 
significance of findings that tuberculous granulomas are extensively 
vascularized**. 

In its natural ectothermic hosts, M. marinum, the closest relative of 
the M. tuberculosis complex, causes a disease called fish tuberculosis, a 
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systemic wasting disease with organized epithelioid granulomas with 
necrotic cores. In zebrafish larvae, mycobacterium-infected macrophages 
form early granulomas, undergo a hallmark epithelioid transformation, 
and activate granuloma-specific gene expression programs”. 

To monitor host vasculature in zebrafish, we used the Tg(kdrl:egfp) 
line (referred to hereafter as Tg(flk1:eGFP)), in which vascular endo- 
thelial cells are fluorescently labelled with enhanced green fluorescent 
protein (eGFP)"°. Injection of mycobacteria into the most commonly 
used caudal vein site results in granulomas in the immediate vicinity of 
the richly vascularized area of the caudal haematopoietic tissue (CHT) 
(Fig. 1a). To determine whether a different injection site with sparser 
and smaller blood vessels was more suitable to detect angiogenesis, we 
assessed primary granulomas that typically formed dorsally after injec- 
tion into the trunk (Fig. 1b, c). As with caudal vein injection, trunk in- 
jection resulted in most bacteria becoming resident in macrophages in 
granulomas (Extended Data Fig. la). Trunk granulomas progressed 
similarly to caudal vein granulomas, with dissemination into richly vas- 
cularized areas" (Supplementary Video 1). Using Tg(mpeg1:tdTomato- 
caax**’) larvae, in which macrophages are labelled by membrane-bound 
Tomato, revealed similar macrophage dynamics", including the inter- 
stitial egress of infected macrophages, the transfer of M. marinum 
between granulomas, and coalescence of distal bacteria into existing gran- 
ulomas (Supplementary Videos 2, 3, 4). Some infected macrophages in- 
vaded vasculature around the primary granuloma (Extended Data Fig. 1b 
and Supplementary Video 5). 

Imaging of trunk-infected larvae revealed angiogenesis, with growth 
of vasculature around sites of infection. We observed sprouting from 
the existing intersegmental vessels (ISVs) starting at 4 days post-infection 
(dpi), just after the formation of granulomas (Fig. 1c). We assessed the 
features of vessel sprouting through long-term live imaging of infected 
larvae (Supplementary Videos 6, 7, 8, 9). Vessel growth occurred in spurts 
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Figure 1 | M. marinum infection induces 
angiogenesis in the zebrafish infection model. 

a, Tomato-fluorescent M. marinum granuloma in 
the CHT region of a Tg(flkl:eGFP) larva. White 
arrowhead indicates area of occlusion in the 
posterior cardinal vein caused by M. marinum 
granuloma. Yellow arrowhead indicates area of 
normal posterior cardinal vein width anterior of 
occlusion. b, Schematic depicting location of 
injection into the trunk ofa 2 days post-fertilization 
larva. CV, caudal vein. c, Time-lapse images of 
vascular growth around a trunk granuloma from a 
single Tg(flk1:eGFP) larva from 4 dpi to 6 dpi. Top, 
Tomato-fluorescent M. marinum and labelled 
vasculature. Bottom, only T¢(flk1:eGFP)-labelled 
vasculature. Blue arrowhead tracks the growth of a 
single vessel across all frames. Images are 
representative of 10 (a) and 20 (c) individual 
animals. Scale bars, 100 tm. 
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with extended periods of quiescence or even reversed directionality 
(Extended Data Fig. 1c). 

To examine the mode of vascular elongation, we used blue-fluorescent 
M. marinum to infect the transgenic zebrafish line Tg(flila:nlseGFP”’; 
flk1:mCherry'®), in which endothelial nuclei are marked by eGFP ex- 
pression. At 4 dpi, nucleileft the highly organized ISVs, always towards 
sites of infection, and subsequently divided within the somites (Extended 
Data Fig. 1d and Supplementary Video 10). Vessels sprouted from both 
arterial and venous ISVs (Extended Data Fig. le). 

We determined whether new blood vessels generated around the 
granuloma were functional. Using DsRed-labelled erythrocytes in the 
transgenic line Tg(flk1:eGFP; gata1:DsRed“) we found substantial blood 
flow through both ectopic vessels that spanned existing vessels com- 
pletely and into newer blind-ending vessels (Extended Data Fig. 1f). 

Angiogenesis required persistent M. marinum infection; it did not 
develop after injection of PBS, heat-killed M. marinum or non-pathogenic 
Escherichia coli (Extended Data Fig. 2a). Tumour-associated macro- 
phages are important drivers of tumour angiogenesis upon tumour 
hypoxia’. Since macrophages serve as a principal repository of virulent 
mycobacteria, we assessed whether there were differences in vascular re- 
cruitment between macrophage-resident and extracellular mycobacteria. 
Infection of double-transgenic Te(flk1:eGFP, mpeg1:tdTomato-caax“’) 
embryos with Cerulean-fluorescent M. marinum allowed us to discrim- 
inate between intracellular and extracellular bacteria (Fig. 2a). Enumer- 
ation of vascular branching revealed an elevated vascularization rate for 
intracellular compared with extracellular foci (odds ratio (OR) and 95% 
confidence interval: 4 dpi, 6.63 (2.57-17.11); 5 dpi, 6.93 (1.51-31.75); 
6 dpi, 27.84 (10.18-76.18)) (Extended Data Fig. 2b). 

To demonstrate a functional requirement for macrophages in recruit- 
ing vasculature we performed morpholino knockdown of the transcrip- 
tion factor Pu.1 (also known as Spi-1), which fully ablates all macrophages 
until 5 days post-fertilization (dpf) (Extended Data Fig. 2c)’. As prev- 
iously reported”, infection burden was markedly increased in the Pu.1 
knockdown animals (Extended Data Fig. 2d). Despite increased burden, 
the total length of abnormal vasculature was decreased in morpholino- 
injected animals compared with controls, suggesting that macrophages 
specifically mediate new vessel growth in the context of inflammation 
(Fig. 2b). 

Vascularization coincided with the formation of granulomas at around 
4 dpi. We disrupted mycobacterium-driven granuloma formation using 
a M. marinum strain deficient in the ESX1 protein export system (AESX1), 
which could not form granulomas’’. We infected AESX1 with a 7.5-fold 
excess over wild type to generate equivalent bacterial burdens at 4 dpi 
(Extended Data Fig. 2e). As expected, both strains were predominantly 
macrophage resident, but AESX1 infection resulted in fewer granulo- 
mas and a marked reduction in angiogenesis (Fig. 2c). Thus, macro- 
phage residence per se is not sufficient to induce angiogenesis. Rather, 
macrophages that have undergone further differentiation to form gran- 
ulomas appear to be required. 

When we analysed the relationship between the size of individual 
infection foci and the length of recruited vasculature we found a strong 
correlation (Extended Data Fig. 3a). This relationship is reminiscent of 
the angiogenic switch, in which tumour size is related directly to the re- 
quirement for vascularization due to the development of local hypoxia’. 

We therefore asked whether hypoxia develops within granulomas. 
Activation of Hif- 10 increases transcription of prolyl hydroxylase 3 (phd3; 
also known as egin3), which serves as a reporter for hypoxic conditions 
in zebrafish larvae'*'’. We detected robust expression of phd3 within 
trunk granulomas, but not CHT granulomas, which are already prox- 
imate to the vasculature (Extended Data Fig. 3b), consistent with a lack 
of hypoxia in the CHT’*. Conversely, to assess whether mycobacteria 
within granulomas were experiencing hypoxia, we investigated the effect 
of metronidazole treatment’. Metronidazole specifically kills anaero- 
bically growing bacteria, including anaerobically growing M. tuberculosis, 
but is ineffective during aerobic growth**”°. We used a nitroreductase- 
expressing transgenic line Tg(lyzC:ntr-p2A-lanYFP™) to titrate a 
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Figure 2 | M. marinum induces vascularization through granuloma 
formation in cooperation with host leukocytes and the expression of Vegf. 
a, Cerulean-fluorescent M. marinum distribution in a Tg(flk1:eGFP, 
mpegl:tdTomato-caax’) double-transgenic larva. Blue arrows indicate sites of 
extracellular bacterial growth, red arrows indicate sites of intracellular 
containment. Image is representative of 48 individual animals. b, c, Length of 
abnormal vasculature in 5 dpi control and Pu.1 morphant larvae (b), and 

4 dpi larvae infected with wild-type (WT) or AESX1 Tomato-fluorescent 

M. marinum (c). MO, morpholino. Student’s t-test with Welch’s correction, 
all data are pooled from two biological replicates. d, Bacterial burden in caudal- 
vein (CV)-infected (left) and trunk-infected larvae (right) treated with 5 mM 
metronidazole. Student’s t-test, data are pooled from three biological replicates. 
e, Distribution of Tomato-fluorescent M. marinum in a Tg(flk1:eGFP) larva 
(left) and corresponding whole-mount in situ hybridization detection of 
vegfaa expression in the same larva (right). Red arrowheads indicate sites of 
M. marinum granulomas. Image is representative of 20 individual animals. 
Scale bars, 100 jim. Error bars represent mean + standard deviation (s.d.). 


biologically active dose of metronidazole (Extended Data Fig. 3c). Con- 
sistent with the phd3 staining, metronidazole treatment reduced bac- 
terial burden in trunk-infected animals but not caudal-vein-infected 
larvae (Fig. 2d). 

In tumours, hypoxia is known to induce VEGF expression, which in 
turn stimulates angiogenesis”'. VEGF has been associated with tuber- 
culosis pathogenesis: it is induced in active pulmonary tuberculosis”, 
and, in a rat corneal model, mediates neovascularization in granulomas 
triggered by mycobacterial trehalose-6,6’-dimycolate’’. Vegfhas a con- 
served role in homeostatic blood vessel recruitment in zebrafish™*. We 
observed strong vegfaa expression around sites of mycobacterial gran- 
ulomas in the trunk (Fig. 2e). Similar expression levels were also observed 
around CHT granulomas, indicating that initiation of pro-angiogenic 
signalling is not dependent on prior development of hypoxia during 
infection (Extended Data Fig. 4a). Analysis of stained sections revealed 
cells around the edge of mycobacterial granulomas expressing vegfaa, 
an observation consistent with a primarily macrophage-driven express- 
ion pattern that could be reduced by macrophage depletion (Extended 
Data Fig. 4b). In addition to its role in angiogenesis, VEGF plays an 
important part in driving vascular permeability”. We performed micro- 
angiography on infected animals and found increased vascular leakage, 
suggesting a local effect of vegfaa expression around infection foci (Ex- 
tended Data Fig. 4c). 
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Figure 3 | Inhibition of VEGFR signalling reduces M. marinum 
pathogenicity in zebrafish larvae. a, Bacterial burden in trunk-infected 
pazopanib-treated (left) and $U5416-treated (right) larvae. Student’s t-test, 
data are pooled from two (left graph) or three (right graph) biological replicates. 
DMSO, dimethylsulphoxide. b, Bacterial dissemination in untreated and 
pazopanib-treated larvae. Total number of granulomas and larvae analysed: 
untreated, 77 granulomas from 18 larvae; pazopanib, 130 granulomas from 
22 larvae. Fisher’s exact test. c, Expression of phd3 hypoxia marker in untreated 
and pazopanib-treated infected larvae detected by in situ hybridization. 

Total number of larvae analysed: 52 (DMSO); 30 (pazopanib). Fisher’s exact 
test, data are from a single technical replicate of two pooled biological replicates. 
d, Bacterial burden in pazopanib-treated, metronidazole (MET)-treated, 

and pazopanib and metronidazole treated larvae. One-way analysis of variance 
(ANOVA) with Tukey’s post-test, data are pooled from three biological 
replicates. e, Bacterial burden in rifampicin (RIF)-treated, SU5416 and 
rifampicin treated, and SU5416-treated larvae. One-way ANOVA with Tukey’s 
post-test, data are pooled from three biological replicates. Error bars 
represent mean + s.d. **P < 0.01, ***P < 0.001. 


Angiogenesis and, specifically, VEGFR signalling have been targeted 
in cancer therapies. Our findings suggested that these therapies might 
also be useful for mycobacterial infections. We chose the well- 
characterized small molecule SU5416, a prototypical tyrosine kinase 
receptor, and pazopanib, a clinically relevant VEGFR inhibitor”. Treat- 
ment of infected animals with SU5416 or pazopanib prevented ectopic 
angiogenesis around the forming granulomas in Tg(flk1:eGFP) zebra- 
fish and reduced net bacterial burdens (Fig. 3a and Extended Data Fig. 5a). 
Neither compound affected in vitro growth of M. marinum, suggesting 
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that the effect on bacterial burden was achieved through targeting host 
pathways (Extended Data Fig. 5b). We confirmed that the treatments 
were specifically targeting angiogenesis, since bacterial burdens were 
lowered only in trunk-infected and not in caudal-vein-infected zebra- 
fish (Extended Data Fig. 5c). Additionally, in trunk-infected animals, 
growth restriction did not occur until after the initiation of angiogen- 
esis at 4 dpi (Extended Data Fig. 5d). To determine whether VEGFR 
inhibition affected macrophage recruitment, we compared the associa- 
tion of bacterial foci with macrophages between control and pazopanib- 
treated larvae. We did not observe any differences in the proportion of 
macrophage-associated foci at time points between 4 and 6 dpi (Extended 
Data Fig. 5e). Together, these data suggest that VEGFR inhibition re- 
duces bacterial burden specifically through restriction of vascularization. 

To determine whether VEGFR inhibitors also reduced infection- 
induced vascular permeability, we measured permeability in treated and 
untreated animals matched for infection burden. We found a reduction 
in vascular leakiness in zebrafish treated with a VEGFR inhibitor (Ex- 
tended Data Fig. 5f). 

Angiogenesis is thought to have an important role in tumour meta- 
stasis’. Analogously, in long-term monitoring experiments, we observed 
a decreased rate of M. marinum dissemination to distal sites in zebra- 
fish treated with a VEGFR antagonist compared with untreated animals 
(OR 0.27, 95% confidence interval 0.12-0.63) (Fig. 3b and Extended 
Data Fig. 5g). To investigate whether decreased dissemination was 
solely a consequence of reduced bacterial growth, we examined a hypo- 
inflammatory state caused by knockdown of an enzyme involved in ei- 
cosanoid biosynthesis, Lta4h”*. Ita4h knockdown increased burden, limited 
angiogenesis and also decreased dissemination, suggesting a role for 
angiogenesis in dissemination independent of burden (Extended Data 
Fig. 5h). 

We hypothesized that reduced bacterial growth due to decreased ox- 
ygen availability may contribute to overall reduced burdens. Pazopanib 
treatment resulted in an increased number of phd3-positive granulomas 
(Fig. 3c and Extended Data Fig. 5i). Moreover, pazopanib treatment 
increased the effectiveness of metronidazole (Fig. 3d). These results sug- 
gest that angiogenesis is an important modulator of oxygen availabil- 
ity for infecting mycobacteria and that its limitation can enhance the 
efficacy of therapies targeting hypoxia. Metronidazole has only mar- 
ginal therapeutic efficacy in human tuberculosis”; our results suggest 
combining it or related compounds with VEGFR inhibitors that in- 
crease the hypoxic environment. Finally, we showed that targeting of 
VEGEFR signalling could complement the first-line antitubercular drug 
rifampicin: a combination of rifampicin and SU5416 resulted in de- 
creased burden compared to either drug alone (Fig. 3e). 


Figure 4 | Inhibition of VEGFR signalling 
reduces M. marinum burden in adult zebrafish. 
a, Survival analysis of adult zebrafish infected with 
400 c.f.u. (red lines), 4,000 c.f.u. (blue lines) or 
8,000 c.f.u. (green lines) of M. marinum. Zebrafish 
are further grouped into control (dashed lines) or 
pazopanib-treated (solid lines) groups. Log-rank 
test: 400 c.fu., not significant; 4,000 c.f.u., 

P= 0.012; 8,000 c.f.u., P = 0.029. b, Representative 
image of a necrotic granuloma from a 2 weeks 
post-infection adult Tg(flk1:eGFP) zebrafish 
infected with Cerulean-fluorescent M. marinum 
(cyan), and stained for hypoxyprobe (red) and with 
4’,6-diamidino-2-phenylindole (DAPI: blue). 
Image is representative of granulomas found in 16 
individual animals. c, Pooled bacterial burden in 
pazopanib-treated adult zebrafish. Matched 
Student’s t-test. d, Comparison of granulomas 
between control and pazopanib-treated adult 
zebrafish scored for M. marinum burden as less 
than ten or more than ten bacteria. Total number of 
zebrafish analysed: 4 (control), 4 (pazopanib). Scale 
bar, 100 tm. Error bars represent mean + s.d. 
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We next addressed the therapeutic effectiveness of VEGFR inhibi- 
tion in adult animals. We infected zebrafish with a range of mycobacter- 
ial doses from 400-8,000 colony-forming units (c.f.u.) via intraperitoneal 
injection, treated them with pazopanib, and observed their survival. Over 
17 days, pazopanib treatment increased survival in animals infected with 
high doses (4,000 and 8,000 c.f-u.) of M. marinum, but there was not yet 
appreciable mortality in the low-dose infection group (Fig. 4a). At a dose 
of 500 c.f.u., where significant mortality was not observed in the first 
3 weeks, we observed granulomas that were completely cellular as well as 
ones that had developed necrotic cores by 2 weeks post-infection (wpi) 
(Extended Data Fig. 6a). Many adult granulomas were hypoxic, as as- 
sessed by pimonidazole treatment and staining. Staining was largely 
specific to granulomas and was concentrated in the cellular rim of the 
necrotic area, a pattern similar to that seen in macaques” (Fig. 4b and 
Extended Data Fig. 6b). 

Pazopanib treatment increased the mean distance of granulomas to 
the nearest vasculature (Extended Data Fig. 6c). As in larvae, treatment 
resulted in reduced bacterial burdens, with a mean fourfold reduction 
relative to control animals over 2 weeks (Fig. 4c). In week-old estab- 
lished infections, pazopanib treatment for 1 week resulted in a mean 
eightfold reduction in burden (Extended Data Fig. 6d). 

After 6 weeks of treatment, vascularization was still significantly re- 
duced relative to controls (Extended Data Fig. 7a). Notably, there was 
an increased fraction of low-burden or sterile granulomas in the pazo- 
panib-treated zebrafish (Fig. 4d and Extended Data Fig. 7b). In addi- 
tion, the drug-treated animals displayed an increased fraction of hypoxic 
granulomas, and there was an association of hypoxic granulomas with 
low-burden lesions (Extended Data Fig. 7c, d). Many low-burden or ster- 
ilized hypoxic granulomas in the drug-treated zebrafish had acellular 
necrotic central areas (Extended Data Figs 6a and 7b). Studies of caseous 
tuberculous granulomas that are sterile in asymptomatic humans sug- 
gest that such an outcome is possible even in the normal course of 
tuberculosis”. 

We conclude that angiogenesis triggered by mycobacterial granulo- 
mas is an important feature of mycobacterial pathogenesis and has im- 
portant consequences for infection pathology and progression. We have 
shown that interception of this angiogenic program using host-directed 
therapies can limit mycobacterial disease. These findings suggest the 
potential utility of host-targeting anti-angiogenic agents as adjunctive 
therapies. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Zebrafish handling. All zebrafish husbandry and experimental procedures were 
performed in accordance and compliance with policies approved by the Duke Uni- 
versity Institutional Animal Care and Use Committee (protocol A180-11-07). Clutches 
of eggs were collected from natural spawning and raised in filtered fish system 
water at 28 °C. Pigment development was halted in 1 dpf embryos by the addition 
of 1-phenyl-2-thiourea (PTU; Sigma-Aldrich; final concentration 45 1g ml 1) Un- 
less otherwise indicated, all zebrafish are from the wild-type AB strain. 
Infection by microinjection. Embryos were anaesthetized at 2 dpf (or 1 dpf for 
morphants) with tricaine (MS-222; Sigma-Aldrich; final concentration 160 1g ml~ >) 
and injected with approximately 200 c.f.u. M. marinum or E. coli in an injection 
bolus of 10-20 nl as shown in Fig. 1b. Infected embryos were then recovered back 
to filtered fish system water supplemented with PTU. Embryos that were physically 
damaged by injection handling were discarded and excluded from further analysis. 
Live imaging. Conventional and time-lapse fluorescent microscopy was carried out 
ona Zeiss Observer Z1 inverted microscope. Embryos were anaesthetized with tri- 
caine and mounted in 3% (w/v) methylcellulose for static microscopy. Embryos for 
time-lapse microscopy were anaesthetized with 120 jg ml’ tricaine, mounted in 
0.75% low melting point agarose in 96-well plates and immersed in filtered fish sys- 
tem water supplemented with PTU. Confocal microscopy was performed with an 
Olympus FV 1000 confocal microscope. Images were processed with Image] (NIH), 
Photoshop CS4 (Adobe) and Volocity 5.4 image analysis software (Improvision/ 
PerkinElmer Life and Analytical Sciences). 

Image analysis. Abnormal vascular length was measured as the two-dimensional 
length of vessels not seen in control embryos in Photoshop using the ruler tool. In- 
fection burden and neutrophil units were measured as the number of pixels above 
background per embryo in Image] using binary thresholding of single channel images 
and the Analyze Particles function. Macrophage association was scored as having 
mpeg:tdTomato-caax expression around sites of Wasabi-fluorescent M. marinum 
expression. Dissemination was scored by comparing images of infected larvae at 4 
and 6 dpi to track M. marinum infection foci. 

Whole-mount in situ hybridization. Whole-mount in situ hybridization was car- 
ried out essentially as described’!. Primers used for cloning vegfaa were previously 
described”. Primers used for cloning phd3 were: 5'-ATTCCTGTGGGCTTCTCA 
AC-3' and 5’-ACACGAACCAAACTGCTCAC-3’. Images of stained embryos were 
collected on a Nikon AZ100 microscope. 

Construction of mpeg1:tdTomato-caax transgene. The previously characterized 
mpeg1 promoter** was PCR amplified using the primers 5'-CCCAAACTCGAGT 
TGTTGGAGCACATCTGACAT-3’ and 5'-GGGAGGAAGCTTTGTTTTGCTG 
TCTCCTGCACT-3’. The product was subsequently cloned into p5E MCS* using 
Xhol and HindIII sites to generate p5E mpegl. 

To construct a gateway-compatible tdTomato-CAAX construct, the sequence 
of tdTomato was PCR amplified with the primers 5'-GGGGACAAGTTTGTAC 
AAAAAAGCAGGCTGGACCATGGTGAGCAAGGGCGAGGAG-3’ and 5’-GG 
GGACCACTTTGTACAAGAAAGCTGGGTAGATCTACTTGTAGAGCTCGT 
CCATGCCG-3’. These primers introduced a silent SacI site in the 3’ end of the 
tdTomato coding sequence and a BglII site downstream of the tdTomato stop codon. 
The PCR product was subsequently cloned by Gateway recombination (Invitrogen) 
into pDONR221 to generate pME tdTomato. 

To generate pME tdTomato-CAAX, pME Tomato was digested with SacI and 
BglII anda linker sequence encoding the human H-Ras prenylation signal was cloned 
into the plasmid to generate pME tdTomato-CAAX. Primer sequences for the linker 
were as follows: top strand, 5’-CTACAAGAAGCTGAACCCTCCTGATGAGA 
GTGGCCCCGGCTGCATGAGCTGCAAGTGTGTGCTCTCCTA-3’; bottom 
strand, 5'-GATCTAGGAGAGCACACACTTGCAGCTCATGCAGCCGGGGC 
CACTCTCATCAGGAGGGTTCAGCTTCTTGTAGAGCT-3’. 

The mpeg1:tdTomato-caax transgene construct was subsequently constructed by 

recombining p5E mpeg1, pME tdTomato-CAAX and p3E polyA into pDestTol2pA2, 
by Gateway multisite recombination (Invitrogen) as previously characterized”, to 
generate pDestTol2; mpeg1:tdTomato-caax. 
Construction of mfap4:turquoise transgene. The mfap4 promoter was PCR 
amplified using the primers 5’-CATGTTCTCGAGGCGTTTCTTGGTACAGCT 
GG-3' and 5'-CATGTTGGATCCCACGATCTAAAGTCATGAAGAAAGA-3’. 
The product was subsequently cloned into p5E MCS using Xhol and BamHI sites. 
The native start codon was mutated using the primer 5’-CTGAGCTGTTGAGG 
AGAGAGTGAGAAG(ATT)GCAGTAAGTTCTGTGGCTGTTTTATTCC-3’ by 
inverse PCR with the backbone primers 5'-GTAAGTTCTGTGGCTGTTTTAT 
TC-3’ and 5'-CTTCTCACTCTCTCCTCAACAG-3’. The final p5E mfap4 was then 
assembled by Gibson assembly using this single-stranded oligonucleotide and the 
backbone. 

To generate pME Turquoise2, we used the primers 5’-GGGGACAAGTTTGT 
ACAAAAAAGCAGGCTGGACCATGGTGAGCAAGGGCGAGGAG-3’ and 5'- 
GGGGACCACTTTGTACAAGAAAGCTGGGTTTACTTGTACAGCTCGTCC 


AT-3’ to amplify off pmTurquoise2 H2A (Addgene plasmid #36202; ref. 35). The 
PCR product was subsequently cloned into pDONR221 by BP cloning (Invitrogen) 
to generate pME Turquoise2. 

The mfap4:turquoise transgene construct was subsequently constructed by re- 

combining p5E mfap4, pME Turquoise and p3E polyA into pDestTol2pA2 to gen- 
erate pDestTol2; mfap4:turquoise. 
Construction of lyzC:ntr-p2A-lanYFP transgene. The previously characterized 
lyzC promoter was PCR amplified from the bacterial artificial chromosome (BAC) 
CH211-250A24 (BACPAC resources, Children’s Hospital of Oakland Research 
Institute) using the primers 5'-CCCATAGGTACCCTGATCACTGGTGTAGTG 
AACTC-3' and 5’-CCCAAACTCGAGATTGTATCACTGCTGATATCTGCTT 
T-3’. The product was subsequently cloned into p5E MCS using KpnI and Xhol 
sites to generate p5E lyzC. 

To generate pME Ntr, we used the primers 5'-GGGGACAAGTTTGTACAA 
AAAAGCAGGCTGGACCATGGCCTCCGGACTCAGATCTCGAGC-3’ and 5’- 
GGGGACCACTTTGTACAAGAAAGCTGGGTCCACTTCGGTTAAGGTGAT 
GTTTTGC-3’ to amplify off pminiTol2-YFP-NTR (gift from L. Ramakrishnan). 
The PCR product was subsequently cloned into pDONR221 by BP cloning (Invi- 
trogen) to generate pME Ntr. 

To generate p3E p2A-lanYFP, we used the primers 5'-GGGGACAGCTTTC 
TTGTACAAAGTGGTTGGATCCTTCAGTCTCGAGATGGTGAGCAAGGGC 
GAGGAG-3’ and 5'’-GGGGACAACTTTGTATAATAAAGTTGTTACTTGTAC 
AGCTCGTCCAT-3’ to amplify the dimeric form of lanYFP from pNCS-dlanYFP 
(Allele Biotechnology). The PCR product was subsequently cloned into pDONR 
P2R-P3 by BP cloning (Invitrogen) to generate p3E lanYFP. p3E lanYFP was sub- 
sequently digested with BamHI and Xhol and ligated with annealed oligonucleo- 
tides encoding the p2A sequence (GATCcGGAAGCGGAGCTACTAACTTCAG 
CCTGCTGAAGCAGGCTGGAGACGTGGAGGAGAACCCTGGACCTC and 
TCGAgAGGTCCAGGGTTCTCCTCCACGTCTCCAGCCTGCTTCAGCAGG 
CTGAAGTTAGTAGCTCCGCTTCCG*) to generate p3E p2a lanYFP. 

The lyzC:ntr-p2A-lanYFP transgene construct was subsequently constructed by 
recombining p5E lyzC, pME Ntr and p3E p2A-lanYFP into pDestTol2pA2 to gen- 
erate pDestTol2; lyzC:ntr-p2A-lanYFP. 

Nucleic acid microinjection. Tol2 transposase was generated from T3TS-Tol2 
(ref. 37) using the mMessage mMachine T3 kit (Invitrogen). To generate trans- 
genic zebrafish, embryos at the one-cell stage were injected with approximately 1 nl 
of trangenesis mixture consisting of 25 ng yl’ transposase RNA and 50ng pl? 
pDestTol2; transgenesis construct. Positive embryos were selected by fluorescence 
microscopy, raised to adulthood and transgenic founders were subsequently identified. 

The Pu.1/Spil morpholino sequence was 5'-GATATACTGATACTCCATTG 
GTGGT-3’ and the Lta4h morpholino sequence was 5'-AGCTAGGGTCTGAAA 
CTGGAGTCAT-3’. Morpholinos were injected at 10-60 1M. 

Drug treatments. Metronidazole (M1547; Sigma-Aldrich; final concentration 
5 mM) was dissolved in water; the relatively high concentration of metronidazole 
seems to be a function of bioavailability in zebrafish as higher concentrations are 
routinely required for nitroreductase-mediated cellular ablation studies**. Pazopanib 
(sc-364564; Santa Cruz Biotechnology; final concentration 250 nM for larvae, 1 uM 
for adults) and SU5416 (S8442; Sigma-Aldrich; final concentration 250 nM) were 
dissolved in DMSO. Metronidazole, pazopanib and SU5416 were added immedi- 
ately after infection and refreshed every 2 days for the duration of the experiment. 
Randomization into drug treatment groups was achieved by random selection of 
infected zebrafish from a single pool before addition of drugs. 
Microangiography. To detect vascular leakiness in wild-type and Tg(flk1:eGEP) 
embryos, embryos were anaesthetized in tricaine water and injected with a 10 nl bolus 
of dextran-Texas Red 70,000 MW (D-1830; Life Technologies; 1 mg ml™ l final con- 
centration) into the posterior section of the dorsal aorta or posterior cardinal vein. 
This injection location avoided injection-trauma-induced tissue leakage occurring 
near M. marinum lesions. Injected embryos were rinsed in tricaine water and imme- 
diately mounted in methylcellulose for fluorescent microscopy. Vascular leakage 
was calculated as a ratio of intersomitic dextran-Texas Red signal divided by aortic 
dextran-Texas Red signal. 

Combined drug treatments. Rifampicin (R3501; Sigma-Aldrich; final concentra- 
tion 50 [1M) was dissolved in DMSO. To achieve a suboptimal dose of VEGER in- 
hibition, SU5416 was added immediately after infection at a final concentration of 
200 nM and refreshed at 3 dpi. Rifampicin was added at 3 dpi and not refreshed for 
the duration of the experiment. 

Metronidazole (final concentration 5 mM) and pazopanib (suboptimal final con- 
centration 200 nM) were added immediately after infection and refreshed every 
2 days for the duration of the experiment. 

Randomization into drug treatment groups was achieved by random selection 
of infected embryos from a single pool prior to addition of drugs; blinding was not 
performed for subsequent quantitation. 
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Adult infections. Adult zebrafish were infected with approximately 500 c.f.u. of 
fluorescent M. marinum via intraperitoneal injection. Zebrafish were maintained 
in beakers in a dedicated incubator at 28 °C with a 14:10h light:dark cycle. Pazo- 
panib was added to a final concentration of 1 1M immediately after infection and 
refreshed every 2 days for the duration of the experiment. 

Randomization into drug treatment groups was achieved by random selection of 
infected fish from a single pool prior to addition of drugs; blinding was not per- 
formed for subsequent quantitation. 

Bacterial recovery from infected adults. Infected adult zebrafish were pre-treated 
with 25 pg ml_' hygromycin to reduce microbiota load for 2h before harvesting. 
Zebrafish were euthanized by tricaine overdose and homogenized by bead mill for 
three bursts of 15 s. Homogenate was plated on Middlebrook 7H10 (262710; Difco) 
supplemented with OADC, hygromycin (H0654; Sigma-Aldrich, 50 g1~') and 
amphotericin B (SV3007801; Thermo Scientific, 10 mg]~ Ay Plates were grown at 
30°C for 10-14 days until fluorescent colonies could be counted. 

Hypoxyprobe staining. Infected adult zebrafish were injected with 15 ul of a 
10mg ml pimonidazole solution (HP7; Hypoxyprobe) every 2 days from 8 dpi 
to 14 dpi. Zebrafish were euthanized by tricaine overdose at 14 dpi, fixed in 4% PFA, 
decalcified in 0.5 M EDTA and cyrosectioned. Frozen sections were stained with 
4.3.1.3 mouse Dylight 549- MAb (HP7; Hypoxyprobe) or with unconjugated 4.3.1.3 
mouse monoclonal antibody and secondary detection was carried out with goat 
anti-mouse Alexa-Fluor 647 (A-21235; Life Technologies) to detect hypoxic cells. 
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Statistics. Data are presented as mean + s.d. Experiments were analysed with the 
statistical tests indicated in figure legends using Prism 5 (Graphpad). Unpaired Stu- 
dent’s t-tests were performed unless otherwise indicated. For ANOVA analyses 
with Tukey’s post-test, P values are indicated as follows: *P < 0.05, **P <0.01, 
***P <(0,001. 
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Extended Data Figure 1 | Angiogenesis in the zebrafish M. marinum 
infection model. a, Image of 6 dpi Tg(mfap4:turquoise™””) larvae infected with 
M. marinum SM2 pMAP49::Venus. Blue arrowheads indicates site of 
granuloma with induced expression of Venus from phagocytosed M. marinum. 
White arrowheads indicate sites of extracellular M. marinum growth detected 
by constitutive DsRed expression but no macrophage-induced Venus 
expression. Image is representative of granulomas found in five individual 
animals. b, Time-lapse images of Cerulean-fluorescent M. marinum 
dissemination from an established granuloma into the adjacent intersegmental 
vessel in a Te(flkl:eGFP, mpegl:tdTomato-caax*) double-transgenic larva 
where bacterial are labelled blue, blood vessels are labelled green and 
macrophages are labelled red. Yellow arrow tracks a single infected macrophage 
egressing the established granuloma and entering the vasculature. Images are 
representative of macrophage behaviour in three individual animals. c, Plots of 


LETTER 


vessel growth kinetics from three individual branches in individual 
Tg(flk1:eGFP) larvae. Videos of each larva analysed are available in 
Supplementary Videos 6 and 7 (left), and 8 and 9 (right). d, Time-lapse images 
of nuclear division during vascular growth in a single Tg(flila:eGFP-nls) larva. 
Blue arrowhead indicates nucleus of interest. Images are representative of 
nuclear division in ten individual animals. Video of nuclear division is available 
in Supplementary Video 10. e, Three-dimensional rendering of recruited 
blood vessels in a Tg(flk1:eGFP) larva infected with Tomato-fluorescent 

M. marinum originating from arterial and venous ISVs as indicated by red and 
blue arrows, respectively. Image is representative of ten individual animals. 

f, Extended exposure images of blood flow in Tg(flk1:eGEP, gata1:DsRed*“”) 
larvae. Blue arrows indicate blood flow through ectopic vessels. Images are 
representative of blood flow in 20 individual animals. Scale bars, 100 um. 
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Extended Data Figure 2 | Formation of ectopic vasculature is dependent on 
granuloma formation. a, Length of abnormal vasculature in Tg(flk1:eGFP) 
larvae injected with PBS, live M. marinum, heat-killed M. marinum and E. coli. 
One-way ANOVA with Tukey’s post-test, data are representative of two 
biological replicates. b, Recruitment of vasculature by intracellular and 
extracellular foci of M. marinum. Total number of foci analysed: 4 dpi, 221 
intracellular, 105 extracellular; 5 dpi, 71 intracellular, 26 extracellular; and 6 dpi, 
131 intracellular, 50 extracellular. Fisher’s exact test. c, Comparative images of 
5 dpf control and Pu.1 morphant Te(mpegl:tdTomato-caax*) larvae. White 
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arrowhead indicates comparative locations within the caudal haematopoietic 
tissue. Blue arrowhead indicates intestinal and yolk sac autofluorescence. 
Scale bar, 100 um. Images are representative of transgene expression in 20 
animals per treatment group. d, e, Bacterial burden in 5 dpi control and 
Pu.1 morphant larvae (d), and 4 dpi larvae infected with wild-type (WT) or 
AESX1 Tomato-fluorescent M. marinum (e). Student’s t-test with Welch’s 
correction, all data are pooled from two biological replicates. Error bars 
represent mean + s.d. **P < 0.01, ***P < 0.001. 
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Extended Data Figure 3 | Granuloma vascularization correlates with 
granuloma size. a, Plot of abnormal vasculature length and bacterial burden 
for individual foci of infection measured by fluorescent pixel count (FPC) in 
Tg(flk1:eGFP) larvae. Slope significantly not zero, P< 0.0001 linear regression, 
data are pooled from three biological replicates. b, Whole-mount in situ 
hybridization detection of phd3 expression. Images are representative of 
phd3 staining in uninfected (20/20), caudal vein (CV)-infected (20/20) and 
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phd3 


Neutrophil fluorescent units 
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trunk-infected (7/20) zebrafish. c, Left, images of Tg(lyzC:ntr-p2A-lan YFP") 
larvae treated with metronidazole as indicated. Green arrowheads indicate 
comparative locations within caudal haematopoietic tissue. Images are median 
images from experimental groups: control, n = 21; 100 1M, n = 22; 1mM, 
n= 24; and 10 mM, n = 19. Right, quantification of neutrophil numbers by 
area of fluorescence in Tg(lyzC:ntr-p2A-lan YFP") larvae treated with 
metronidazole from 2 dpf to 6 dpf. Error bars represent mean = s.d. 
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Extended Data Figure 4 | M. marinum infection induces expression of arrows indicate sites of infection identified by increased nuclear fast red 
vegfaa. a, Whole-mount in situ hybridization detection of vegfaa expressionin __ staining density. Images are representative of ten animals per treatment group. 
uninfected, caudal vein (CV)-injected and trunk-injected larvae. Red arrow c, Microangiography of Tg(flk1:eGFP) larvae imaged at 1, 5 and 10 min 
indicates sites of infection with vegfaa expression. Images are representative of post-injection (mpi). Top panels are representative of uninfected larvae, 


20 animals per treatment group. b, Representative histological sections of bottom panels are representative of larvae infected with unlabelled 
whole-mount in situ hybridization detected vegfaa expression in control M. marinum. Images are representative of ten animals per treatment group. 
infected larvae and a Pu.1 morpholino (MO)-treated infected larva. Black Scale bars, 100 ptm. 
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Extended Data Figure 5 | Pazopanib and SU5416 reduce M. marinum 
pathogenicity in zebrafish larvae. a, Left, comparative images of 
Tg(flk1:eGFP) larvae infected with Tomato-fluorescent M. marinum and 
treated with DMSO, pazopanib or SU5416. Top panels depict Tomato- 
fluorescent M. marinum and labelled vasculature. Bottom panels depict only 
Tg(flk1:eGFP)-labelled vasculature. Blue arrowheads indicate somites with 
ectopic vasculature. Images are representative of 20 animals per treatment 
group. Right, length of abnormal vasculature in pazopanib- or SU5416-treated 
larvae. Student’s t-test, data are pooled from two or three biological replicates, 
respectively. b, Growth curve of Tomato-fluorescent M. marinum in 7H9 
broth culture supplemented with pazopanib or SU5416. Data are representative 
of two biological replicates. c, Bacterial burden in caudal-vein-infected larvae 
treated with either pazopanib or SU5416. Student's t-test, data are pooled 
from two biological replicates. d, Longitudinal bacterial burden from 2 to 6 dpi 
in trunk-infected larvae treated with pazopanib. One-way ANOVA with 
Tukey’s post-test. NS, not significant; n=14 individuals per group. 

e, Comparison of M. marinum foci between control and pazopanib-treated 
larvae scored by association with macrophages. Fisher’s exact test, n=40 
individuals per group. f, Left, microangiography of larvae infected with 
cerulean-fluorescent M. marinum, injected with high-molecular-weight 


dextran-Texas Red at 6 dpi and imaged at 5 minutes post dextran injection 
(mpi). Top panels depict Cerulean-fluorescent M. marinum and dextran-Texas 
Red, bottom panels depict only dextran-Texas Red in vasculature and 
leakage around sites of infection. Green arrowheads indicate somites with the 
highest leakage signals in infected larvae. Images are median images from graph 
on right. Right, quantification of vascular leakage in uninfected, DMSO- 

and pazopanib-treated larvae. One-way ANOVA with Tukey’s post-test, data 
are representative of two biological replicates. g, Dissemination of Wasabi- 
fluorescent M. marinum in larvae treated with DMSO or pazopanib. Red 
arrowheads indicate contained foci of infection that remain in the same 
location throughout the course of infection, blue arrowheads indicate 
disseminated foci of infection. Images are representative of data in Fig. 3b. 

h, Bacterial burden (left), length of abnormal vasculature (middle) and 
dissemination (right) in 5 dpi control and Lta4h morphant larvae. i, Whole- 
mount in situ hybridization detection of phd3 expression in uninfected (white 
arrow) and M. marinum-infected zebrafish larvae. Blue arrows indicate 
phd3-expression-positive larvae with purple staining, red arrow indicates site 
of bacterial infection with no purple staining, indicating phd3-expression- 
negative larva. Image is representative of data in Fig. 3c. Scale bars, 100 um. 
Error bars represent mean + s.d. *P < 0.05, **P < 0.01, ***P < 0.001. 
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Extended Data Figure 6 | Effects of pazopanib treatment are reproduced in 
adult zebrafish infections. a, Images of non-necrotic (left) and necrotic 
(right) Tomato-fluorescent M. marinum granulomas stained with DAPI 
(top) and haematoxylin and eosin (bottom). White arrows indicate non- 
necrotic granuloma, yellow arrows indicate necrotic granuloma. Images 

are representative of granulomas found in eight individual animals. 

b, Representative image of a necrotic granuloma from a negative control, not 
injected with pimonidazole, 2 wpi adult Tg(flkl:eGFP) zebrafish infected 

with cerulean-fluorescent M. marinum (cyan), and stained for hypoxyprobe 
(red) and with DAPI (blue). Images are representative of granulomas found in 
two individual animals. c, Left, representative image of Tomato-fluorescent 
M. marinum granuloma in Tg(flk1:eGFP) zebrafish stained with DAPI. White 
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arrow indicates granuloma, yellow line indicates path measured for distance 
between granuloma and nearest vasculature (indicated by green arrow). 
Image is representative of data presented on the right, in panel d and Extended 
Data Fig. 7a. Right, distance between granulomas and nearest vasculature 
measured in 2 wpi adult Tg(flk1:eGFP) zebrafish. Total number of zebrafish 
analysed: 4 (control), 4 (pazopanib). d, Left, distance between granulomas and 
nearest vasculature measured in 2 wpi adult Tg(flkl:eGFP) zebrafish treated 
with pazopanib for 1 week. Total number of zebrafish analysed: 2 (control), 

2 (pazopanib). Right, bacterial burden in 2 wpi adult zebrafish treated 

with pazopanib for 1 week. Student’s t-test, data are pooled from three 
biological replicates. 
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Extended Data Figure 7 | Pazopanib increases the frequency of hypoxicand _ representative of data in c, d and Fig. 4d. c, Comparison of granulomas between 


low-burden granulomas. a, Distance between granulomas and nearest control and pazopanib-treated adult zebrafish scored for pimonidazole 
vasculature measured in 6 wpi adult Tg(flkl:eGFP) zebrafish. Total number staining. Total number of zebrafish analysed: 4 (control), 4 (pazopanib). 

of zebrafish analysed: 4 (control), 4 (pazopanib). Green dot indicates outlier d, Comparison of granulomas between non-hypoxic and hypoxic granulomas 
that was omitted from statistical analysis. b, Images of low burden/hypoxic in control and pazopanib-treated adult zebrafish scored for M. marinum 
(left) and high burden/non-hypoxic (right) granulomas in zebrafish that burden. Total number of zebrafish analysed: 4 (control), 4 (pazopanib). 

were injected with pimonidazole. Asterisks indicate Tomato-fluorescent Scale bars, 100 tm. Error bars represent mean + s.d. 


M. marinum, arrows indicate areas of hypoxia in granuloma. Images are 
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p63" Krt5" distal airway stem cells are essential for 


lung regeneration 


WeiZuol, Ting Zhang", Daniel Zheng’ An Wul!, Shou Ping Guan’, Audrey-Ann Liew', Yusuke Yamamoto’, Xia Wang’, Siew Joo Lim!, 
Matthew Vincent*, Mark Lessard‘, Christopher P. Crum>, Wa Xianb**°” & Frank McKeon)*® 


Lung diseases such as chronic obstructive pulmonary disease’ and 
pulmonary fibrosis” involve the progressive and inexorable destruc- 
tion of oxygen exchange surfaces and airways, and have emerged asa 
leading cause of death worldwide. Mitigating therapies, aside from 
impractical organ transplantation, remain limited and the possibil- 
ity of regenerative medicine has lacked empirical support. However, 
it is clinically known that patients who survive sudden, massive loss 
of lung tissue from necrotizing pneumonia** or acute respiratory 
distress syndrome** often recover full pulmonary function within six 
months. Correspondingly, we recently demonstrated lung regenera- 
tion in mice following H1N1 influenza virus infection, and linked 
distal airway stem cells expressing Trp63 (p63) and keratin 5, called 
DASC?®*", to this process’. Here we show that pre-existing, intrins- 
ically committed DASC’®/*"® undergo a proliferative expansion 
in response to influenza-induced lung damage, and assemble into 
nascent alveoli at sites of interstitial lung inflammation. We also show 
that the selective ablation of DASC’®*"" in vivo prevents this regen- 
eration, leading to pre-fibrotic lesions and deficient oxygen exchange. 
Finally, we demonstrate that single DASC?®**"°-derived pedigrees 
differentiate to type I and type II pneumocytes as well as bronchiolar 
secretory cells following transplantation to infected lung and also 
minimize the structural consequences of endogenous stem cell loss 
on this process. The ability to propagate these cells in culture while 
maintaining their intrinsic lineage commitment suggests their poten- 
tial in stem cell-based therapies for acute and chronic lung diseases. 

HIN1 influenza virus infection of murine lung triggers a process of 
leukocyte infiltration and lung damage similar to that of acute respiratory 
distress syndrome (ARDS)’~"” (Fig. 1a). Damaged regions are marked by 
densely packed, CD45" neutrophils and macrophages"! and an absence 
of markers for type I (Pdpn*) and type II (SPC*) pneumocytes (Fig. 1a; 
Extended Data Fig. 1a). Despite the local destruction of alveoli, these 
same regions harbour discrete clusters of p63* Krt5* epithelial cells 
proposed to be the early stages of de novo alveoli formation (Fig. 1a)’. 
Three dimensional reconstruction of serial sections of lungs at 15 days 
post-infection (dpi) reveal a broad distribution of Krt5~ cells along the 
axis of the bronchioles (Fig. 1b; Supplementary Video 1). 

To decipher the origin of these p63" Krt5* cells appearing in response 
to lung damage, we performed genetic lineage-tracing of Krt5* cells 
starting before infection through the cycle of lung damage and reso- 
lution. Mice expressing a tamoxifen-dependent lacZ gene under the 
control of the Krt5 promoter’ (Tg (KRT5-Cre®®™) ROSA26-Isl-lacZ)!? 
were treated with tamoxifen before intratracheal delivery of HIN] influ- 
enza Virus (Fig. 1c; Extended Data Fig. 1b). At 0 dpi, lacZ activity was 
not detectable in whole-mount lung and yet Krt5” cells were evident in 
distal lung as clusters of peribronchiolar Krt5* p63° cells were observed 
in one of three to four consecutive sections of lung. Approximately 50% 
of these cells expressed Escherichia coli-specific B-galactosidase (7 clusters 
of p63 cells were observed in 25 slides with a total of 39 p63* cells, of 


which 19 were B-galactosidase positive; Fig. 1c). No labelling of other 
cell types in the lung was observed. In addition, colonies of distal airway 
stem cells (DASC) with long-term self-renewal (passage 4) were gen- 
erated”"*"> from three 0 dpi mice and stained with antibodies to Krt5 
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Figure 1 | Lineage tracing of Krt5* cells following viral infection. a, Left, 
mouse lung before and after viral infection. Right, immunofluorescence images 
of infected lung of anti- Krt5 (red), anti-Pdpn (green) with DNA counterstain 
(DAPI, blue). Scale bar, 1 mm. Insets, high magnification of indicated regions. 
n = 10 mice. Scale bars, 100 um. b, Three-dimensional reconstruction of 
anti-Krt5 (red) from serial sections of infected lung (bronchioles, blue). Grid, 
100 X 100 jum. c, Schematic of lineage tracing experiment following tamoxifen 
treatment to reveal lacZ expression 0, 9, 15 and 60 days post infection (n = 3 
mice for each time point and control). At 0 dpi, rare clusters of p63” cells in 
bronchioles express E. coli B-galactosidase (left top, arrows). Scale bar, 20 jim. 
Colony of Krt5* DASC co-expressing B-galactosidase at 0 dpi (left bottom). 
At 9, 15 and 60 dpi, whole lungs stained by X-gal. d, Immunofluorescence in 
60 dpi lung sections with indicated antibodies. TI, TII indicate type I and II 
pneumocytes, respectively. Scale bar, 100 um. 
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and f-galactosidase. All 256 colonies stained with anti-Krt5 antibodies 
and 132 co-expressed [-galactosidase (Fig. 1c; Extended data Fig. 1c). 
At 9 dpi, lacZ activity was subtle and restricted to the airways (Fig. 1c), 
consistent with our previous observations that at 9 dpi, p63* Krt5~ 
cells had accumulated within mouse bronchioles with no evidence of 
migration to interstitial regions’. By 15 dpi, however, the lacZ signal 
was significantly more robust and included broader regions of inter- 
stitial lung (Fig. 1c; Extended Data Fig. 1d). At 60 dpi, the lacZ signal 
was distributed along the conducting airways and the surrounding inter- 
stitial regions, suggesting a progressive process (Fig. 1c). Importantly, 
no lacZ activity was detected in the lungs of tamoxifen-treated mice in 
the absence of infection, indicating that the robust signal we observed 
was in response to lung damage (Extended data Fig. le). Histological 
analysis of the lacZ-positive regions of lung from infected mice revealed 
broad interstitial areas of staining corresponding to alveoli and that 
72 + 7% of 1,051 lineage-labelled cells expressed type I (1H8* and Pdpn*) 
or type II (SPC*) pneumocyte markers with the remainder being secret- 
ory cells in the bronchioles (n = 3 mice; Fig. 1d; Extended Data Fig. 1d, f). 

To generate a mouse model in which DASC?®*’*"® could be con- 
ditionally ablated, we engineered the human diphtheria toxin receptor 
(DTR)"** into the Krt6a (Krt6) locus (Krt6-DTR; Fig. 2a; Extended Data 
Fig. 2a, b) as the Krt6 gene becomes activated specifically in DASCs 
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at approximately 8-10 dpi’, rather than the p63 or Krt5 gene that are 
expressed in stem cells of many stratified epithelia’’. Consistently, 
DASC?®**"® from the Krt6a-DTR mice were found to co-express 
DTR and Krt6a in vivo and in vitro (Extended Data Fig. 2c, d). To test 
this ablation model in vivo, we infected the Krt6-DTR mice with influ- 
enza Virus and injected diphtheria toxin at 8 dpi (Fig. 2a). By 15 dpi, 
exposure to toxin resulted in a rapid loss of interstitial clusters of 
Krt5" Krt6" cells (Fig. 2b, c), indicating that we had generated a highly 
efficient ablation model. Compared to wild-type controls, Krt6-DTR 
mice lost 90% of Krt5" cells and over 99% of Krt6* cells following 
diphtheria toxin treatment (Fig. 2c). 

We next evaluated the effect of ablating DASC?**’“"® on the process 
of lung regeneration itself. Focal infiltrates of leukocytes appearing as 
haematoxylin-eosin staining densities form at about 10-15 dpi in wild- 
type mice and are typically resolved over the next 15-45 days’ (Fig. 2d, e). 
However, in diphtheria-toxin-treated Krt6-DTR mice, these densities 
fail to resolve over time (Fig. 2d, e). The persistent damage in DASCPOS_ 
ablated lungs is also evident by whole-lung genome expression analyses 
(Extended Data Fig. 3a, false discovery rate (FDR) q < 0.001). Histolo- 
gical comparisons of the persistent densities at 30 dpi revealed the pos- 
sible basis for this difference. Although wild-type lungs at 30 dpi still 
display some lung densities, nearly all of these densities were negative 
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Figure 2 | Conditional ablation of activated 
DASC?°/K5_ a, Modified Krt6a locus driving 

t the human diphtheria toxin receptor and 
experimental scheme. DTox, diphtheria toxin. 

b, Immunofluorescence images of distal lung at 
15 dpi with indicated diphtheria toxin (+/—DTox) 
condition. c, Quantification of Krt5* and Krt6t 
cells in DTox-treated mice. n = 2 mice per group, 
6 sections covering the whole lung for each mouse. 
d, Lung sections from indicated mice following 
influenza virus infection and treatment with DTox. 
0 e, Morphometric analysis of interstitial densities 
in sections of lung (n = 3 mice per condition and 
time). Error bars, s.e.m. f, Anti-Pdpn (red) 
immunofluorescence of lung densities to reveal 
type I pneumocytes counterstained with DAPI 

5 (blue). Scale bar, 100 jim. H&E, haematoxylin and 
eosin. g, Peripheral capillary oxygen saturation 
(SpO,) values obtained by pulse oximetry (WT, 
n= 3, DTIR+ DTox, n = 4) at indicated times. 
Error bars, s.e.m. **P < 0.01 for -DTox versus 
+DTox. h, Gene ontology classes and fold-change 
DASC?®*">_ ablation versus normal mouse lung 
at 30 dpi. Below, heat map of differentially 
expressed (P < 0.05) genes involved in pre-fibrosis, 
alveolar structure and vasculature. 
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for the CD45 leukocyte marker and instead possess unusual networks 
of Pdpn” type I pneumocytes (Fig. 2f; Extended Data Fig. 3b). These 
networks of type I pneumocytes lacked markers of type II pneumocytes 
(Extended Data Fig. 4). Interestingly, these networks stain positive for 
other type I pneumocyte markers (Aqp5) but not all (for example, Hopx; 
Extended Data Fig. 4), suggesting the possibility that the type I pneu- 
mocytes in these networks are undergoing a maturation process. Remark- 
ably, similar alveoli-like structures formed by type I pneumocytes lack- 
ing type II pneumocytes in anatomical analyses of mice recovering 
from infection by the NWS influenza A virus were reported nearly 
40 years ago’®. In contrast to the type I pneumocytes networks in per- 
sistent densities of wild-type mice, DASC?**“"?-ablated mice showed 
no such alveolar networks at 30 dpi (Fig. 2f) and instead maintained 
persistent infiltration of leukocytes evidenced by anti-CD45 staining 
(Extended Data Fig. 3b). 

We next asked if the loss of DASC?®""® also affected aspects of 
pulmonary function in these mice. Using pulse oximetry’ to assess peri- 
pheral capillary oxygen saturation (SpO2), we found that the normal 
95% SpO, values plummeted to approximately 70% in normal and 
DASC’***"*_ ablated mice eight days following infection. However, the 
normal mice recovered to 90% SpO, by 40 dpi, whereas the DASC?®/""8- 
ablated mice only reached SpO, values approaching 75% saturation 
(Fig. 2g). Consistent with this apparent decline in pulmonary function, 
the persistent densities in the 30 dpi Krto-DTR lung showed staining 
for smooth muscle actin (a-SMA; Extended Data Fig. 5), a marker of 
myofibroblasts known to be associated with a pre-fibrotic state of the 
lung”. These same interstitial regions showed weak but detectable stain- 
ing with Masson’s trichrome blue, a marker of fibrosis (Extended Data 
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Fig. 5). Correspondingly, whole-genome expression profiles of wild-type 
and DASC?**’*"*_ ablated lungs indicated the persistence of inflammat- 
ory gene expression and a relative decrease in gene expression linked to 
vasculature development in the DASC?®**"®-ablated lungs at 30 dpi 
(Fig. 2h; Extended Data Fig. 5). Moreover, the DASCP°?/K"5_ablated 
mice showed the presence of pre-fibrosis gene signature”’”* including 
vimentin, FSP1 and collagen genes (Fig. 2h). Together these data suggest 
that the ablation of DASC?®**"® arising during acute injury results ina 
failure of the regenerative process with structural and functional con- 
sequences for the lung. 

Whereas p63" Krt5* cells are prominent features of the proximal 
lung, their presence in distal lung has been less clear’’. This is reflected 
in the abundant Krt5* cells in proximal lung (Fig. 3a) and the inter- 
mittent, peri-bronchiolar clusters seen in one of every three to four 
sections of distal lung (Fig. 3a, arrows). Consistently, 100-fold fewer 
DASC?*S"® colonies arise from cell suspensions of distal lung than 
tracheobronchiolar stem cell (TBSC?®**) colonies from proximal lung 
(Extended data Fig. 6a). Regardless, both DASC?*”“"* and TBSCP?""5 
can be cloned and propagated as single-cell-derived pedigrees’ that show 
very different fates upon differentiation. In air-liquid interface (ALI) 
cultures”, TBSCs yield a stratified epithelium with Krt5" basal cells 
and apical ciliated and secretory cells typical of proximal airway, while 
DASCs yield a monolayer of differentiated cells expressing Pdpn (Fig. 3b; 
Extended Data Fig. 6b). In three-dimensional Matrigel cultures, DASCs 
form unilaminar, alveolar-like spheres composed of cells expressing 
type I (Pdpn and Aqp5) and type II (SPC) pneumocyte markers (Fig. 3c). 
The very minor variation in gene expression between TBSCs and DASCs 


Figure 3 | Cloning and in vitro differentiation of 
DASCs and TBSCs. a, Krt5* cells in proximal 
(left) and distal (arrows, right) lung, with 
corresponding TBSC and DASC colonies (outside 
panels). Scale bar, 50 tm. b, Differentiation of 
TBSC and DASC in air-liquid interface cultures 
showing respectively stratified epithelia with 
ciliated (acetylated-Tub’ ) cells anda monolayer of 
differentiated cells. Scale bar, 20 um. c, Unilaminar 
spheres formed by DASCs in three-dimensional 
Matrigel cultures and the expression of indicated 
markers in sections. Scale bar, 50 um. Insets, high 
magnification. d, Scatter plot of gene expression 
of immature TBSCs and DASCs highlighting 
common and disparately expressed (fold 

change > 3, P< 0.001) genes. e, Scatter plot of gene 
expression of TBSC-ALI and DASC-MAT. f, Top, 
heat map of gene sets differentially expressed in 
murine alveoli and tracheal epithelium (P< 0.05). 
Bottom, heat map of differentially expressed genes 
in DASC-MAT versus TBSC-ALI (P < 0.05) 
informed by alveolar and tracheal data sets. 

g, Histogram of differentially expressed genes of 
DASC-MAT versus TBSC-ALI for which 
validating immunohistochemistry data are 
available (see http://www.proteinatlas.org/). 
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(less than 1% with fold change > 1.5, P< 0.05; Fig. 3d) transforms to 
major differences between TBSCs differentiated in air—liquid interface 
(TBSC-ALI) with DASCs in Matrigel (DASC-MAT), consistent with 
their divergent fates (Fig. 3e). To further probe the differential fates of 
TBSC-ALI and DASC-MAT, we used laser-capture microdissection 
(LCM) to generate gene expression profiles of normal tracheal and 
alveolar epithelium and compared with those of TBSC-ALI and DASC- 
MAT (Fig. 3f). Gene-set enrichment analysis revealed a strong coinci- 
dence in gene expression patterns between in vitro-differentiated TBSCs 
and DASCs and their in vivo counterparts (FDR q value < 0.001), 
including many genes not previously identified as differential markers 
though confirmed by publically available antibody data sets” (Fig. 3g). 

To determine if cloned lung stem cells could incorporate into damaged 
lung, we first generated single-cell-derived pedigrees of DASC'*” and 
TBSC"* from murine lung (Fig. 4a; Extended data Fig. 7a). We deliv- 
ered one million TBSC"“ or DASC"*” to syngeneic mice five days after 
influenza virus infection (Fig. 4a). At 40 dpi (35 days post-transplanta- 
tion), DASC* were distributed in interstitial regions emanating from 
airways (Fig. 4b). At 90 dpi, DASC* showed a more homogenous 
pattern in interstitial spaces compared to 40 dpi (Fig. 4b). Significantly, 
mock-infected lungs, or mock-transplanted, infected lungs, showed 


Krt5-Cre&R? DASCs DASC**2—_,. Transplant 5 dpi 
-ROSA26-IsI-lacZ + 40H-Tmx pedigree (10°) via ITD 
b DASC#<2 40 dpi DASC*2 90 dpi DASC*2 40 dno HiN1 


SPC 1H8 


B-gal SPC 1H8 


lacZ* e 
lung 


ds pasc 


cells TBSC#* 


40 dpi TBSC 


SPD 
SPA 
SPB 
SPC 
Aqp5 
Hopx 
Ki67 
Krt5 
p63 
B———_—_— 1.3 


DASC*? pedigree 


DASCSP 


DASCSP 40 dpi 


Figure 4 | Transplantation of TBSC" and DASC"”. a, Schematic of 
pedigree generation and transplantation. 40H-Tmx, 4-hydroxy-tamoxifen; 
ITD, intratracheal delivery. b, B-galactosidase activity in whole lung following 
DASC** transplantation. c, Comparison between B-galactosidase-positive 
(left panels) and -negative (right panels) regions of transplanted lung and 
markers of type I (1H8) and type II (SPC) cells. Scale bar, 50 jim. d, Heat map of 
selected, differentially expressed genes (P < 0.05) comparing immature 
DASC*™ before transplantation with laser-capture microdissected of 
lacZ-positive cells from transplanted lungs at 90 dpi. e, B-galactosidase activity 
in whole lung following TBSC" transplantation. f, From left, DASCS?? 
colony in culture; middle, cryosection of lung following DASC*™” 
transplantation. Scale bar, 50 jum. Right, immunofluorescence of anti-GFP, 
anti-Pdpn and anti-SPC in 40 dpi transplanted lung. Scale bar, 20 um. 
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no incorporation of DAS' at 35 days post-transplantation (Fig. 4b: 
Extended data Fig. 7b). As with the lineage-tracing experiments, we used 
E. coli-specific B-galactosidase antibodies to mark the transplanted cells 
in 90 dpi lungs and observed that at least 40% of the cells in alveolar 
region express pneumocytes markers (Pdpn, 1H8 and SPC) and at least 
80% of the B-galactosidase-positive cells in bronchiolar region express 
secretory cell marker CC10 (mouse number n = 3; Fig. 4c; Extended 
data Fig. 7c). Gene expression analysis of the lacZ-positive regions of 
these lungs using laser-capture microdissection revealed a typical alve- 
oli gene signature very different from that of immature DASCs or of 
damaged lung (Fig. 4d). Together these findings demonstrate that single- 
cell-derived pedigree lines of DASCs can readily incorporate into damaged 
lung during the process of lung regeneration and give rise to multiple 
epithelial cell types of bronchioles and alveoli. In contrast, transplanted 
TBSC"“ appeared confined to major airways (Fig. 4e). Parallel trans- 
plantations with green fluorescent protein (GFP)-labelled DASC (Fig. 4f) 
yielded similar patterns of co-labelling of lineage and type I and type II 
pneumocyte markers at 40 dpi seen with transplanted DASC*™ (Eig. 4f). 
Significantly, a fraction of these transplanted DASC“ or their progeny 
continue to express the proliferation marker Ki67 even up to 60 dpi 
(Extended data Fig. 8), suggesting their high viability and extended 
contribution to the regenerative process. Lastly, morphometric ana- 
lyses of diphtheria-toxin-treated, virally infected Krt6-DTR mice indi- 
cate that transplantation of DASCs results in a significant reduction of 
interstitial densities at 40 dpi (Extended data Fig. 9). 

In the present work, we highlight the remarkable regenerative capa- 
city of the lung following large-scale, acute lung damage’ and the func- 
tion of a very discrete, pre-existing population of lung stem cells in this 
process. In addition, we demonstrate that upon transplantation, single- 
cell-derived DASC?®*/“® pedigrees contribute multiple epithelial lineages, 
including bronchiolar secretory cells as well as alveolar type I and type 
II pneumocytes, to regenerating distal lung. Thus DASC?®*”*™® act in 
an emergent, conditional manner that is generally distinct from that of 
type II pneumocytes*, progenitor cells of limited self-renewal capa- 
city that participate in highly focal, homeostatic lung repair. Our find- 
ings provide a mechanistic framework for the still emerging concept 
of lung regeneration” and underscore potential therapeutic strategies 
exploiting this process. 
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METHODS 


Influenza virus infection. All mouse experiments were conducted under IACUC 
guidelines and approved protocols. Influenza A (H1N1) mouse-adapted PR/8/34 
(VR-95, ATCC, USA) was used for all viral infections. The virus stock was amp- 
lified by V. Chow (Department of Microbiology, National University of Singapore) 
in chicken eggs. Virus dilutions were made in DMEM medium containing 1 jig ml" 
TPCK trypsin (Sigma-Aldrich, USA) on ice, aliquoted and stored at —80 °C. The 
viral titre is measured by plaque assay on Madin Darby canine kidney cells (MDCK, 
ATCC, USA). Virus was further diluted to final concentration in PBS on ice and 
use freshly. The infection of mice by H1N1 Influenza virus was performed in 
an Animal Biosafety Level 2 (ABSL-2) facility. Adult mice (>6 weeks old) were 
anaesthetized with intraperitoneal injection of ketamine (150mg per kg body 
weight) + xylazine (10 mg per kg body weight). The anaesthetized mouse was 
rested on a stand with its front teeth hung over a suture. This causes the mouse 
airway to be relaxed and accessible. Using a flat forceps the tongue of the animal 
was drawn out of its mouth so that the anatomy can be easily visualized. Intratracheal 
delivery was performed by pipetting 50 jl virus directly into the larynx/trachea. 
Sterile PBS was administered to the control animals. The tongue of mouse was held 
throughout the procedure so that the virus was aspirated into the lungs. 

Tissue histology. At appropriate time points, mice were euthanized by CO, 
asphyxiation followed by exsanguination, and the diaphragm was carefully cut 
open without touching the lungs. A small incision was made in the proximal 
region of the trachea and lung was inflated with 4% formaldehyde using a 30G 
needle. The inflated lungs were dissected and fixed with 4% formaldehyde before 
whole mount imaging, paraffin section or cryosection. For whole mount imaging, 
lungs were dehydrated in graded ethanol series and sunk in BABB (benzyl alcohol/ 
benzyl benzoate 1:2 ratio) at 4°C overnight*!. For paraffin section, lungs were 
processed in an automatic tissue processer (Leica Microsystems, Germany) and 
embedded into paraffin blocks. The blocks were cut using microtome (Leica 
Microsystems, Germany) to 5~7 um thickness at distinct planes. The sections were 
placed on poly-lysine coated glass slides and stored at room temperature until 
further use. For cryosection, lungs were embedded within Tissue-Tek O.C.T com- 
pound, solidified on dry ice and cut using a cryotome (Leica Microsystems, Germany) 
of 10 tm thickness. 

Haematoxylin and eosin (H&E) staining was performed using standard proce- 
dures. To analyse lung damage level, interstitial densities were assessed by H&E 
staining backed up by type I (anti-Pdpn) and type II (anti-SPC) pneumocytes 
staining. A minimum of 8 axial lung interval sections (typically 400 mm”) covering 
>2 mm tissue depth were cut, each was stained, scored for densities, and quan- 
tified for percentage of total lung area by Zeiss AxioVision (Carl Zeiss, Germany) 
morphometric software. In addition, random histological sections are scored 
based on the general pathological morphology by blinded expert to confirm the 
conclusion. Stitching scanning of H&E slides were performed in histopathology 
lab of IMCB, A-STAR, Singapore. Masson Trichrome staining of lung fibrosis 
was performed using the Trichrome Staining Kit (Sigma-Aldrich, USA). The kit 
involves sequential staining of the sections with Biebrich Scarlet-Acid Fuchsin, 
PTA/PMA and Aniline Blue. After staining, sections were dehydrated and mounted 
using Vectamount and visualized under a light microscope (Imager Z1, Carl Zeiss, 
Germany). 

Immunofluorescence staining. For immunofluorescence staining, paraffin- 
embedded tissue slides were subjected to antigen retrieval in citrate buffer (pH 6, 
Sigma-Aldrich, USA) at 120°C for 20 min with the exception of CD45 staining. 
Antibodies used for immunofluorescence included stem cell markers: Krt5 (1:200, 
EP1601Y, Thermo), Krt6 (1:100, T-18, Santa Cruz and Ab24646, Abcam), p63 (1:2, 
4A4 clone, house-made); pneumocyte markers: Pdpn (1:100, M-172 and A-18, 
Santa Cruz), Aqp5 (1:100, G-19 and H-200, Santa Cruz), Hopx (1:100, FL-73, Santa 
Cruz), SPC (1:100, M-20 and FL-197, Santa Cruz), 1H8 mouse monoclonal antibodies 
(1:2, house-made); others: CC10 (1:100, T-18, Santa Cruz), CD45 (1:100, 30-F11, Santa 
Cruz), bacteria-specific B-galactosidase (1:400, A-11132, Life Technologies and 
Ab9361, Abcam), Ki67 (1:200, RM-9106, Thermo), alpha-SMA (1:400, 1A4, Dako), 
HB-EGF (1:100, Ab16783, Abcam), GFP (1:100, B-2, Santa Cruz) and acetylated 
alpha tubulin (1:1,000, Ab24610, Abcam). Among them, 1H8 murine monoclonal 
antibody was generated under IRB approval using standard methods, and vali- 
dated for specific mouse type I pneumocyte staining by immunofluorescence. Alexa- 
conjugated secondary antibodies (1:200) were used for immunofluorescence. After 
staining, tissues slides underwent auto-fluorescence removal and mounting with 
DAPI containing mounting media (Vectashield, Vector Labs, USA). Stained slides 
were stored at 4°C in the dark and images were taken using Zeiss fluorescence 
microscope (Observer Z1, Carl Zeiss, Germany) or Zeiss confocal microscope (LSM 
510, Carl Zeiss, Germany). 

Lineage tracing of Krt5* stem cells. Krt5-CRE/ERT2 (011916-MU, MMRRC, 
USA) mice were crossed with Rosa26-loxP-STOP-loxP-lacZ mice (003309, Jackson 
Laboratory, USA) to generate mice for lineage tracing. Genotype was confirmed 
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for each mouse using tail genomic DNA collection and PCR validation. Tamoxifen 
was dissolved in corn oil and freshly applied to mice at 200 mg per kg body weight 
through intraperitoneal injection at indicated days before influenza infection. The 
gap between tamoxifen and H1N1 administration was varied as indicated to control 
for any possible tamoxifen persistence. Sublethal doses (25 plaque forming units) 
of H1N1 influenza A virus were diluted in PBS and intratracheally delivered into 
anaesthetized mice. After infection by virus, mouse lungs were collected at various 
time points and subjected to standard immunofluorescence staining by two inde- 
pendent bacteria-specific B-galactosidase antibodies (1:400, A-11132, Life Tech- 
nologies and Ab9361, Abcam) or X-gal staining. For X-gal staining, lungs were 
briefly fixed on ice for 30 min and subjected to X-gal (Invitrogen, USA) whole- 
mount staining overnight using standard protocol. After staining, lungs were washed 
and fixed again in 4% formaldehyde before whole-mount visualization or paraffin 
sectioning. For whole-mount visualization, lungs were made transparent by BABB 
as described before and images were taken using dissection microscope (Leica Micro- 
systems, Germany). For paraffin section, 5~7 jum sections were cut and immuno- 
fluorescence staining is performed following standard protocol and visualized under 
light microscope (Imager Z1, Carl Zeiss, Germany). 

Generation of the Krt6é-DTR mouse. The complementary DNA of diphtheria 
toxin receptor (DTR), which is also known as human heparin-binding epidermal 
growth factor-like growth factor (HB-EGF), and a neomycin resistance selection 
cassette flanked with loxPs (Floxed Neo®) with an introduced Pacl restriction 
endonuclease site, were introduced to replace the first 4 exons of Krt6a in a 
modified bacterial artificial chromosome (mBAC). Retrieval and linearization of 
a selected section of this mBAC resulted in the targeting construct, which was 
electroporated into V6.4 B6.129 hybrid embryonic stem cells to be selected with 
G418. Single colonies screened by Southern blot analysis of Pacl digests detected 
by a 5’ external probe (hybridizing unmodified Krt6a genomic DNA) revealed 
fragment size differences due to the introduced Pacl site. Wild-type alleles with 
endogenous Pacl sites returned 35.9 kb fragments, while a recombination event 
returned a shorter 14.7 kb. Floxed Neo® probes provided further verification for a 
single specific insertion. Successfully engineered embryonic stem cells were micro- 
injected into blastocysts to generate chimaeras, which were similarly tested for 
germline transmission via backcrosses to C57BL/6. Progeny from crosses with 
FVB/N-Tg(ACTB-cre)2Mrt/J (Jackson Laboratory, USA) were screened for stable 
transmission of Krt6a-DTR alleles Cre-mediated excision of Floxed Neo" as 
indicated by a reduced 12.9 kb Pacl digest fragment. 

Oxygen saturation measurements. Peripheral capillary oxygen saturation (SpO,) 
was measured using MouseOx Plus pulse oximeter (Starr Life Sciences, USA). An 
S-size CollarClip sensor was applied to depilated regions of the back neck skin and 
mice were rested for one hour before measuring SpO,. Ten minutes before mea- 
surement, mice were anaesthetized by ketamine (150 mg per kg body weight) + 
xylazine (10 mg per kg body weight) intraperitoneal injection. After SpO, readings 
were stable, data collection was started and SpO, readings were recorded every 
second for one minute to calculate an average value. 

Cloning of TBSC and DASC and in vitro differentiation. To isolate airway stem 
cells, trachea and lung were collected from adult mice and immersed in cold wash 
buffer (F12 medium, 1% Pen/Strep, 5% FBS). The trachea and two main bronchi 
were separated from the lungs and the lobes were cut with a sterile surgical blade 
into small pieces and digested with dissociation buffer (F12/DMEM, Img ml”! 
protease, 0.005% trypsin and 10ng ml _* DNase I) overnight with gentle rocking. 
The dissociated cells were washed with wash buffer, passed through a 40-j1m cell 
strainer, counted and plated onto irradiated 3T3 feeder cells as described”'*"°. After 
4 consecutive passages, single cell colonies were picked up by cloning ring and 
expanded. Colonies were characterized by immunofluorescence staining (E-cadherin 
*Krt5" p63 Pdpn” CC10 SPC_ ). These colonies have been passaged up to 12 months 
with no observable phenotypic or chromosome count changes. To compare gene 
expression in immature TBSCs and DASCs, whole genome expression microar- 
rays were performed on passage 7 (P7) cells. Figure 3c is a scatterplot comparison 
of the TBSC and DASC whole transcriptome data. Gene names are indicted for 
outliers (fold change > 3, P value <0.001) typically reported as markers or well 
known for other biology and for which we validated expression by quantitative 
PCR. To activate lacZ in TBSCs and DASCs cloned from ROSA26-Isl-lacZ; Krt5- 
Cre®®!? mice, we exposed colonies in vitro to 4-OH-tamoxifen (1 jg ml '; Sigma- 
Aldrich) for 5 days at which point >70% cells express B-galactosidase and all cells 
were Krt5* p63*SPC~CC10 Pdpn”. GFP labelling was performed by retroviral 
transduction of pMX-GFP (Cell Biolab, USA) followed by manual sorting of single 
GFP* DASCs to 96-well plates. 

Matrigel differentiation assays were performed as previously described’. FGF10 
(50 ng ml — ') was included in medium to favour distal airway differentiation. Under 
this condition, DASCs clustered and grew into sphere-like structures marked by 
unilaminar epithelia surrounding a clear lumen. TBSCs were grown on air-liquid 
interface (ALI) cultures for differentiation’. ALI differentiation was performed as 
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previously described'', FGF10 was excluded and retinoid acid (50 nM) included in 
medium to favour proximal airway differentiation. Under ALI condition, TBSC 
forms a stratified epithelium structure marked by ciliated cells (acetylated-Tub+) 
and basal cells (Krt5*). In addition, we used laser capture microdissection (LCM, 
Zeiss PALM) to dissect mouse trachea epithelium, alveoli and damaged lung inter- 
stitium for microarray analysis to develop tissue specific gene expression signatures. 
Orthotopic transplantation of stem cells. To perform orthotropic transplanta- 
tion of stem cells into lung, adult mice were infected with 25 plaque forming units 
HIN1 influenza virus five days prior transplantation. Stem cell pedigrees were 
expanded in culture and harvested by differential trypsinization to remove feeder 
cells. One million cells were diluted in 50 tl DMEM/F12 medium for transplanta- 
tion into each mouse. Adult mice (>6 weeks old) were anaesthetized with intra- 
peritoneal injection of ketamine (150 mg per kg body weight) + xylazine (10 mg 
per kg body weight) and rested on a stand gesture. Intratracheal aspiration was 
performed by pipetting the virus directly into trachea via mouth. For rescue of 
Krt6-DTR phenotype by DASC transplantation, 3X 10° cells were transplanted on 
8 dpi (the same day DTox was given) and control mice received medium alone. 
Laser capture microdissection (LCM). For LCM, fresh, non-fixed or X-gal stained 
tissue samples were embedded in Tissue-Tek O.C.T. compound (Sakura, Japan) 
on dry ice. 10 {1m cryosections were mounted on PEN Membrane slides (Leica 
Microsystems, Germany). Slides were dehydrated in 95%, 75%, 50% nuclease-free 
ethanol for 30 s each, stained in Arcturus HistoGene staining solution (Life Tech- 
nologies, USA) for 1 min then dehydrated in 50%, 75%, 95%, 100% ethanol for 30s 
each and xylene for 5 min. Slides were allowed to air dry and LCM was performed 
immediately using an inverted microscope and PALM Robo software (Carl Zeiss, 
Germany). Cut elements were catapulted onto an Adhesive Cap-500 tube (Carl 
Zeiss) and subsequently transferred into a PCR tube with 50 yl Arcturus PicoPure 
extraction buffer. RNA was isolated using the Arcturus PicoPure RNA Isolation kit 
(Life Technologies, USA). 

Microarray and bioinformatics. RNAs obtained from LCM, cell colonies or 
whole lobe of mouse lungs were used for microarray after being amplified using 
WT-Ovation Pico RNA Amplification System (NUGEN, UK) and fragmented and 
labelled using the FL-Ovation cDNA Biotin Module V2 (NUGEN, UK). Labelled 
cDNA was then hybridized onto GeneChip Mouse Exon 1.0 ST Array (Affymetrix, 
USA) using appropriate hybridization controls and the chips were scanned and 


analysed as described previously’. Duplicate experiments for microarray were 
taken from two biological samples. To validate sample quality, probe hybridization 
ratios were calculated using Affymetrix Expression Console software (Affymetrix, 
USA). The intensity values were log,-transformed and imported into the Partek 
Genomics Suite 6.6 (Partek Inc., USA). Exons were summarized to genes and 
1-way ANOVA was performed to identify differentially expressed genes. P values 
and fold-change were calculated for each analysis. Heat maps were generated using 
Pearson’s correlation and Ward’s method and principal component analysis was 
conducted using all probe sets. Gene ontology analyses were performed using the 
web-based GeneTrail tool”. 

For the comparison between gene expression profiles of in situ alveoli and 
tracheal epithelium with those of in vitro differentiated TBSCs and DASCs (see 
Fig. 3f, g), gene set enrichment analyses were performed using web-based tools 
developed by the Broad Institute****. 605 genes highly expressed in tracheal epi- 
thelium (trachea versus alveoli, 4-fold higher, P value < 0.05) and 914 genes highly 
expressed in alveolar epithelium (trachea versus alveoli, 4-fold lower, P value < 
0.05) were used as the queried gene sets (signature), respectively. Enrichment 
scores were determined after 1,000 permutations, and the permutation type was 
configured to the gene sets. Thereafter the whole genome expression data of DASC- 
MAT versus TBSC-ALI were applied to GSEA program to evaluate enrichment of 
signature in the fold-change ordered list. Results with normalized enrichment 
score > 1.4and FDR q value < 0.001 were considered significant. The comparison 
between gene expression profiles of normal alveoli and damaged interstitium with 
those of lung tissue with or without DASC ablation (see Extended Data Fig. 3a) was 
performed in similar manner. 


31. Becker, K., Jahrling, N., Saghafi, S. & Dodt, H. U. Immunostaining, dehydration, 
and clearing of mouse embryos for ultramicroscopy. Cold Spring Harb. Protoc. 
2013, 743-744 (2013). 

32. Keller, A. et a/. GeneTrailExpress: a web-based pipeline for the statistical 
evaluation of microarray experiments. BMC Bioinformatics 9, 552 (2008). 

33. Subramanian, A. et a/. Gene set enrichment analysis: a knowledge-based 
approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. 
USA 102, 15545-15550 (2005). 

34. Mootha, V. K. et al. PGC-1 alpha responsive genes involved in oxidative 
phosphorylation are coordinately downregulated in human diabetes. Nature 
Genet. 34, 267-273 (2003). 
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Extended Data Figure 1 | Lineage tracing of Krt5* cells. a, Left, 
immunofluorescence images of sections of 15 dpi lung with staining patterns of 
antibodies to pan-leukocyte marker CD45 and the type I pneumocyte marker 
Pdpn with DNA counterstained with DAPI. Right, immunofluorescence 
images of pan-leukocyte marker CD45 and the type II pneumocyte marker 
SPC. Scale bar, 150 pm. b, X-gal staining (blue) to reveal lacZ-dependent 
B-galactosidase activity in whole lungs after 15 and 40 days post infection 
following long time gaps between induction of lacZ labelling by tamoxifen and 
influenza infection-induced lung damage. The similarity of this long gap 
labelling and the short gap labelling presented in Fig. 1 argues against prolonged 
actions of tamoxifen in these lineage-labelling protocols. Tamoxifen is given at 
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indicated times before infection and no-tamoxifen control is included. 

c, Immunofluorescence images of colonies of DASCs derived from tamoxifen- 
treated ROSA26-Isl-lacZ; Krt5-Cre®*!? mice stained with antibodies to keratin 
5 (Krt5) or Krt5 and E. coli-specific B-galactosidase. d, Histological section 
of lung at 15 dpi stained with E. coli-specific B-galactosidase antibody and 
markers of secretory cells (CC10*) and expanded stem cells (Krt5*). Scale bar, 
50 pum. e, Whole-mount image of X-gal developed, uninfected lung from 
ROSA26-Isl-lacZ; Krt5-Cre®*!? which received tamoxifen treatments at —69, 
—66 and —63 days before dissection. f, Histological section of 60 dpi lung 
stained with E. coli-specific B-galactosidase antibody and markers of type I 
pneumocytes (Pdpn” ) and type II pneumocytes (SPC*) 
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Extended Data Figure 2 | Conditional DASC?®*"* ablation mouse model. _c, Co-expression of Krt6 and DTR in Krt5* pods in 15 dpi lung. d, Histogram 
a, Schematic of Krt6a locus, the targeting vector constructed to introduce showing resistance of wild-type, 12 dpi DASC?®’*"* to diphtheria toxin 
the human diphtheria toxin receptor (DTR). b, The structure of the (DTox) and the sensitivity of DASCP°/"°/PTR to diphtheria toxin. n = 3 mice 
modified Krt6a locus in embryonic stem cells screened by Southern blot. per group. Error bars, s.e.m. 
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Enrichment plot: Normal alveolar signature (q<0.001) 
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Extended Data Figure 3 | Persistent damage in DASC?®*“"-ablated lungs. 
a, Gene set enrichment analysis (GSEA) showing the overrepresentation of 
normal alveolar signature gene sets in WT rather than Krt6-DTR mouse lungs 
(whole lobes). For normal alveolar signature build up, laser capture 
microdissection of frozen sections was used to dissect normal alveoli region 
from 0 dpilung and damaged interstitial infiltrated region from 15 dpi lung for 
microarray analysis. Differentially expressed genes (fold change > 5, P< 0.01) 
were used to develop normal alveolar gene expression signatures. b, Top 
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panel, histological analysis of lung densities using anti-Pdpn antibodies (red) 
and anti-CD45 (green) to reveal type I pneumocytes and leukocyte infiltration, 
respectively. Left, wild-type mice showing apparently normal lung region 
adjacent to interstitial density having Pdpn™ network but lacking CD45* 
infiltrates. Right, Krt6é-DTR lung showing apparently normal region adjacent 
to zone of damaged interstitial lung lacking Pdpn* network but having CD45" 
infiltrates. Bottom panel, H&E staining of the same histological region. 

Scale bar, 100 um. 
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Extended Data Figure 4 | Networks of type I pneumocytes in 30 dpi mouse _ analysis of wild-type lung densities using anti-Pdpn antibodies (red) and 
lung. a, Histological analysis of lung densities using anti-Pdpn antibodies (red) _ anti-Aqp5 (green) type I pneumocyte markers showing the interstitial density 
and anti-SPC (green) to reveal type I and type II pneumocytes respectively. having Pdpn/Agp5 double-positive network. Bottom panel, wild-type mice 
Left, wild-type mice showing apparently normal lung region adjacent to show apparently normal lung region adjacent to interstitial density having 
interstitial density having Pdpn’ network but lacking SPC” cells. Right, Pdpn* network but the density lack expression of another type I pneumocyte 
Krt6-DTR lung showing normal region adjacent to zone of damaged interstitial | marker, Hopx. Scale bar, 100 1m. 

lung lacking both pneumocytes. Scale bar, 100 jum. b, Top panel, histological 
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Extended Data Figure 5 | Failure of regeneration in DASC?®*”*"*-ablated 
lungs. a, Histological section through 30 dpi DASC-ablated lung (Krt6é-DTR 
+DTox) showing normal region (Pdpn*) adjacent to interstitial density 
positive for a-SMA and weakly positive for Masson’s trichrome (MT) staining 
for fibrosis. Scale bar, 100 pm. b, Histological section through 30 dpi control 


lung showing normal region (Pdpn” ) and interstitial density (Pdpn” ) which 
are both negative for «-SMA. c, Expression heat map of selected, differentially 
expressed genes (P < 0.05) comparing wild-type mouse lungs with DASC- 
ablated mouse lungs at 30 dpi. Scale bar, 100 um. 
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Extended Data Figure 6 | Cloning and in vitro differentiation of Error bars, s.e.m. b, Immunofluorescence images of sections of TBSC and 
TBSCP°*"> and DASC?®*"*_ a, Histogram of cloning efficiency of TBSCs_— DASC air-liquid interface cultures using an antibody to the type I pneumocyte 
and DASCs on irradiated 3T3-J2 cells per 1 million tracheal or distal airway marker Pdpn (green). Sections were counterstained with DAPI (blue). 

cells derived from respective tissues of adult mice. Tissues derived from 3 mice. 
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Extended Data Figure 7 | Transplantation of DASC"*”. infection without stem cell transplantation. c, Left, bright field/ 

a, Immunofluorescence characterization of DASCs isolated from Krt5- immunofluorescence image of section of lung at 90 dpi following 
Cre®®"?;ROSA26-lsl-lacZ mice following Cre activation with 40H-tamoxifen. transplantation of DASC*™ stained with antibodies to B-galactosidase (red). 
From left, colony stained with antibodies to p63 (green) and Krt5 (red), Right panels, immunofluorescence images of co-staining of transplanted 
p63 (green) and E. coli B-galactosidase (red), Krt5 (red) and CC10 (green), DASC" with antibodies to Pdpn, SPC, or CC10 at high magnification. 


and Krt5 (red) and SPC (green). b, Whole mount image of lung 90 days after 
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Extended Data Figure 8 | Persistent proliferation of transplanted DASC. 
Co-staining of antibodies to GFP (green) with the cell proliferation marker 
Ki67 (red) in sections of lung transplanted with DASC” at 12 dpi lung 

(7 days post transplantation) and 60 dpi lung (55 days post transplantation). 
Top left, immunofluorescence image of lung following transplantation of 
DASCC? (7 days post-transplantation;12 dpi) stained with anti-GFP (green) 


and the cell cycle marker Ki67 (red, in nucleus). Top right, bronchiole 
co-stained with antibodies to GFP and Ki67 from 7 days post-transplantation 
lung. Bottom, staining of interstitial lung transplanted 55 days prior with 
DASC“? with antibodies to GFP and Ki67. Arrows indicate cells 
co-expressing GFP and Ki67. 
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Extended Data Figure 9 | Stem cell transplantation reduces interstitial Krt6-DTR mice without diphtheria toxin (—DTox, mouse number n = 3), 
densities in DASC?°***_ablated lungs. a, Histological sections through entire with diphtheria toxin (+DTox, n = 4), or with diphtheria toxin and 
lobe of Krt6-DTR mice with (left) and without (right) diphtheria toxin transplanted DASCs (+DTox+DASC, n = 4). Error bars indicate s.e.m. 


treatment forty days post-influenza infection. b, Histogram of morphometric and # indicates P value = 0.029 by Wilcoxon rank-sum test. 
quantification of lung densities following 40 day influenza virus infection of 
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Lineage-negative progenitors mobilize to regenerate 
lung epithelium after major injury 


Andrew E. Vaughan", Alexis N. Brumwell’, Ying Xi’, Jeffrey E. Gotts!, Doug G. Brownfield’, Barbara Treutlein*, Kevin Tan’, 
Victor Tan', Feng Chun Liu', Mark R. Looney’, Michael A. Matthay', Jason R. Rock* & Harold A. Chapman! 


Broadly, tissue regeneration is achieved in two ways: by proliferation 
of common differentiated cells and/or by deployment of special- 
ized stem/progenitor cells. Which of these pathways applies is both 
organ- and injury-specific’ *. Current models in the lung posit that 
epithelial repair can be attributed to cells expressing mature line- 
age markers* *. By contrast, here we define the regenerative role of 
previously uncharacterized, rare lineage-negative epithelial stem/ 
progenitor (LNEP) cells present within normal distal lung. Quiescent 
LNEPs activate a ANp63 (a p63 splice variant) and cytokeratin 5 
remodelling program after influenza or bleomycin injury in mice. 
Activated cells proliferate and migrate widely to occupy heavily injured 
areas depleted of mature lineages, at which point they differentiate 
towards mature epithelium. Lineage tracing revealed scant contri- 
bution of pre-existing mature epithelial cells in such repair, whereas 
orthotopic transplantation of LNEPs, isolated by a definitive surface 
profile identified through single-cell sequencing, directly demonstrated 
the proliferative capacity and multipotency of this population. LNEPs 
require Notch signalling to activate the ANp63 and cytokeratin 5 
program, and subsequent Notch blockade promotes an alveolar cell 
fate. Persistent Notch signalling after injury led to parenchymal ‘micro- 
honeycombing’ (alveolar cysts), indicative of failed regeneration. Lungs 
from patients with fibrosis show analogous honeycomb cysts with 
evidence of hyperactive Notch signalling. Our findings indicate that 
distinct stem/progenitor cell pools repopulate injured tissue depend- 
ing on the extent of the injury, and the outcomes of regeneration or 
fibrosis may depend in part on the dynamics of LNEP Notch signalling. 
Influenza infection challenges pulmonary regenerative capacity owing 
to the widespread ablation of epithelial cells in substantial areas of lung® 
(Extended Data Fig. 1g, h). A robust expansion of regenerative cytok- 
teratin-5-positive (Krt5”) cells in the lung parenchyma after influenza 
infection has been observed in mice’, which we confirmed (Extended 
Data Fig. 1). In addition, we directly observed migration (Supplementary 
Videos 1, 2 and 3) and identified coexpression of integrin 01684 (Extended 
Data Figs 1 and 2). These cells also appear variably after bleomycin 
injury, in which approximately one-third of the Krt5* cells resolved 
into type II pneumocytes by 50 days after injury (Extended Data Fig. 3). 
A cellular origin and mechanistic framework for expansion after influ- 
enza, and potential parallels in human lung injury, remain unknown. 
To define the cell of origin, we lineage-traced mature cell types impli- 
cated in epithelial repair. Krt5~ cells appearing by day 11 after influenza 
infection were essentially completely untraced using Clara cell-specific 
protein (CC10) and surfactant protein C (SPC) Cre-recombinase drivers 
(CC10-CreERT2 and SPC-CreERT2, respectively, containing tamoxifen- 
inducible Cre-modified oestrogen receptor fusion proteins) (Fig. 1b-e 
and Extended Data Fig. 1i). Analysis at 7-8 days after injury confirmed 
mutual exclusivity of CC10-CreERT2-labelled and Krt5“ cells (Fig. 1b). 
Conflicting results in other reports are probably caused by tamoxifen 
persistence (Supplementary Discussion and Extended Data Fig. 4). 


A small fraction (13%) of expanded Krt5 * cells bear the Krt5-CreERT2 
lineage label (Fig. 1f, g), raising the possibility that tracheal basal cells 
might migrate distally during injury. We transplanted sections of fluor- 
escent trachea into syngeneic animals and a non-fluorescent left lung 
into a fluorescent mouse’. Abundant Krt5~ cells arose after infection but 
none were fluorescent (Fig. 1h and Extended Data Fig. 1j, k). Upper- 
airway basal cells therefore do not contribute to this phenomenon and 
instead implicate a lineage-negative epithelial progenitor (LNEP) as the 
major source of ANp63* and Krt5™ cells. 

To characterize quiescent LNEPs, we used integrin B, expression in 
CC10-CreERT2 mice to segregate LNEPs from club cells in uninjured 
lungs (Fig. 2a) and confirmed minimal expression of mature lineage 
markers (Extended Data Fig. 5c). The CC10~ Bat (LNEP-containing) 
population uniquely expressed ANp63 (Extended Data Fig. 5c). ANp63~ 
cells were identified in situ scattered sporadically throughout distal air- 
ways (Fig. 2c). These cells did not express detectable Krt5 protein (Ex- 
tended Data Fig. 5a). In a total of 65 small airways examined in two 
mice, we identified 24 ANp63 * cells. Only 7 of the 24 cells were labelled 
in Krt5-CreERT2 mice (Fig. 2c and Extended Data Fig. 5a), probably 
explaining the small fraction of post-injury Krt5 “cells bearing the Krt5- 
CreERT2 lineage label (Fig. 1f, g). 

Given the infrequency of ANp63" cells, we suspected progenitor ac- 
tivity oftheCC10° B,* population might be restricted toa smaller sub- 
set. Immunostaining revealed multicilia in 78% of this population, whereas 
ANp63 * cells were less than 1% (Extended Data Fig. 5b). To address this 
heterogeneity, we performed single-cell RNA sequencing (RNA-seq) 
on CC10— Bat cells and on rare Krt5-CreERT2-labelled cells, a subset 
of this population (Fig. 2i). The ANp63 transcript was detected in sev- 
eral cells in the CC10- B,4* population (Fig. 2b, red circle, far left) as 
well as the Krt5-traced cells (Fig. 2b, green triangle). Analysis of vari- 
ance (ANOVA) comparison between putative LNEPs (Krt5-traced cells 
combined with all p63-expressing cells) and the remaining cells revealed 
enrichment of ~900 genes (>2-fold change, >1 FPKM, P = 0.05) in 
the LNEP group (Supplementary Data 1). We note enrichment for 
pluripotency-associated transcription factors (Myc and K1f4) in the LNEP 
group (Fig. 2b), while many genes enriched in the remaining cells (Fig. 2b, 
top rows, Supplementary Data 2) are known markers of ciliated cells”. 
Surprisingly, ANp63* CC10~ B," cells most closely related to the Krt5- 
traced cells also expressed cilia-associated genes (Fig. 2b, denoted 
by asterisk). Cytospins of CC10” 4" cells revealed primary cilia on 
ANp63° cells and additional cells without discernible ANp63 (Fig. 2c, 
right), indicating that the LNEP profile extends to a larger fraction of 
ANp63-low or -negative cells. 

To assess the potential of LNEPs in vivo, we devised a transplanta- 
tion assay by which ~10° fluorescent CC10~ 8,” cells were delivered 
orthotopically into influenza-injured mice (Fig. 2d). Seeded LNEPs 
developed into multicellular structures in two patterns seemingly depen- 
dent on location: areas of type II cells that were virtually indistinguishable 
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29 JANUARY 2015 | VOL 517 | NATURE | 621 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


a —~®) D-25 D-23 
Differentiation genes A A A 


driving CreERT2 Tamoxifen 


c 


Krt5-Cre(tdTomato)* (%) 


Figure 1 | Injury-induced Krt5* cells are derived from a lineage-negative 
precursor. a, Schematic depicting lineage analysis methodology. b, ¢, Krt5* 
cells are untraced (GFP-negative) after influenza injury in CC10-CreERT2/ 
mTmG mice. D, day. d, e, Quantification of CC10 and SPC lineage tracing, 
expressed as a percentage of cells counted bearing the respective lineage tag 
(see Methods). Short chase time after tamoxifen administration to CC10- 
CreERT2 mice results in significant trace in Krt5* cells (e) (Supplementary 


from surrounding endogenous type II cells (Extended Data Fig. 6a, b, h), 
and engraftments expressing Krt5 (Extended Data Fig. 6a, c) and CC10 
(Extended Data Fig. 6g) near endogenous Krt5* CC10* structures. By 
type II cells engrafted infrequently in small clusters (<8 cells), and ex- 
pressed only alveolar markers (Extended Data Fig. 6i). CC10* cells could 
engraft but exhibited scant differentiation, even losing CC10 express- 
ion (Extended Data Fig. 6j, k). Transplantation of multi-ciliated cells 
resulted in only occasional persistence of single cells without struc- 
tures (Extended Data Fig. 61), consistent with their lack of progenitor 
properties!” 

Transplantation of mixed enhanced green fluorescent protein (eGFP)- 
labelled and tdTomato-expressing LNEPs demonstrated engraftments 
to be largely non-overlapping (Fig. 2e) and highly proliferative (Extended 
Data Fig. 6e), arguing for near-clonal expansion. Although mature 
type II cells do not express integrin B, (ref. 13), clones derived from 
donor LNEPs exhibited B, and SPC co-expression 5 days after trans- 
plant (Extended Data Fig. 6e), confirming their LNEP origin. These data 
demonstrate multipotency of LNEPs as well as the viability of orthotopic 
cell transplantation as a functional tool. 

We interrogated the RNA-seq analysis and identified enrichment 
for CD14 in ANp63* CC10~ B,4” cells (Fig. 2b, asterisk). In combina- 
tion with CD200, which further selects against multi-ciliated cells (Ex- 
tended Data Fig. 5e), CD14* cells were isolated and transplanted. Bay 
CD200* CD14" cells (~3,000) (Fig. 2f) phenocopied the larger (150,000) 
CC10 84" population (Fig. 2g, h and Extended Data Fig. 7a—c), vali- 
dating this small population as the active LNEPs. Using a complement- 
ary approach, distal Krt5-CreERT2-labelled ANp63* cells within the 
LNEP fraction were transplanted (1,000 cells per mouse) (Fig. 2i). Mul- 
tipotency was again observed, although we noted many fewer SPC- 
expressing cells (Fig. 2j, kand Extended Data Fig. 7d, e). We posit that 
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Discussion). Data are mean + s.d., n = 7 CC10-CreERT2 and n = 3 SPC- 
CreERT2 mice quantified. f, g, A small fraction of Krt5* cells bear the 
Krt5-CreERT2 trace (tdTomato* ), quantified in g (n = 3 Krt5-CreERT2 mice). 
h, Krt5* cells are not fluorescent after lung transplantation from a wild-type 
donor into a tdTomato recipient. Non-transplanted lung tissue retained 
fluorescence (inset). Image representative of one lung transplant. 

Scale bars, 20 tm. Source data available online. 


isolation using Krt5-driven Cre enriches for LNEPs that have undergone 
partial commitment to the Krt5 program, whereas surface-marker- 
based selection represents a less biased approach. This is consistent with 
lineage analysis (Fig. 1h) indicating Krt5-CreERT2-traced cells can only 
account for a small fraction of the Krt5~ expansion. 

Accordingly, LNEPs cultured ex vivo did not express Krt5 even when 
treated with various trophic/morphogenic factors (Supplementary Table 2). 
However, bronchoalveolar lavage fluid (BALF) from injured mice in- 
duced marked proliferation and Krt5 expression. A total of 77 + 13% 
(mean = s.d.) of colonies treated with the BALF stained positive for 
Krt5 (Fig. 3a—d), whereas type II cells treated with the same BALF did 
not respond. 

Although the active principle(s) in injury BALF is uncertain, a screen 
of pathway inhibitors implicated a critical role of Notch. The y-secretase 
inhibitor DAPT in conjunction with active BALF attenuated intensity 
and reduced the Krt5~ colony fraction (22.7 + 13%) (Fig. 3c, d). This 
prompted us to analyse Notch activity in vivo. Notch] intracellular do- 
main (ICD) and the canonical Notch signalling target Hes1 were evid- 
ent in the nucleus of parenchymal Krt5" cells after influenza (Fig. 3e, f). 
Notch activity was further validated using a Notch reporter mouse (Cp- 
eGFP) (Extended Data Fig. 8a, b). When DAPT was administered to 
mice after influenza, the fraction of lung area bearing Krt5* cells by 
day 11 was markedly reduced (Fig. 3g and Extended Data Fig. 7h). 

During development, Notch signalling is known to suppress alveolar 
differentiation in both the lung and the mammary gland’*"*. When LNEPs 
were cultured in the presence of y-secretase inhibitors, we observed 
strong induction of SPC expression, further promoted by 3-isobutyl- 
1-methylxanthine (IBMX)’® (Fig. 3h, i). Therefore, persistent Notch 
signalling prevents alveolar differentiation, whereas removal of this 
signal promotes maturation towards type II cells. This result proved 
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Figure 2 | Isolation and transplantation of a lineage-negative distal 
epithelial population. a, FACS segregation of epithelial (EpCAM ) cells by By 
expression and a CC10-CreERT2 lineage tag (GFP), demonstrating a B,* 
population distinct from club cells. b, Hierarchical clustering/heat map of 
RNA-seq transcriptomes from single CC10— Ba* cells (circles) and distal 
Krt5-CreERT2-traced cells (triangles) (columns). Listed genes (rows) were 
selected from >1,200 differentially expressed genes identified by ANOVA. 

c, Immunofluorescent staining for ANp63 in uninjured lungs from Krt5- 
CreERT2 (tdTomato*) mice. Single cells from cytospins of the CC10— Bar 
population demonstrate primary cilium (green) ina subset of non-multiciliated 


relevant to the long-term outcome of regeneration in the influenza in- 
jury model. 

Although regions of relatively normal histology bearing the Krt5- 
CreERT2 trace develop after resolution of bleomycin injury (Extended 
Data Fig. 3e), we were surprised to find few traced SPC” type II cells 
after influenza (Extended Data Fig. 8e). Instead, large regions of Krt5- 
Cre-traced epithelial cysts were present in all mice examined between 
days 52 and 200 after injury (n = 7 mice) (Fig. 4a). These cysts consisted 
of CC10* cells, scattered Krt5* cells, otherwise nondescript epithelial 
cells, but very few SPC* cells (Fig. 4a and Extended Data Fig. 8c-e), 
raising the possibility of ongoing Notch activity in cystic epithelium. 
Strong Hes1 expression persisted in Krt5-CreER-traced cyst-derived 
cells indefinitely (Fig. 4b and Extended Data Fig. 8f), whereas it is nor- 
mally undetectable in alveolar epithelium (Extended Data Fig. 8g). The 
same correlation was observed in transplant experiments: LNEP-derived 
Krt5* and CC10" areas exhibited strong Hes1 expression, whereas it 
was low or absent in areas of type II cell differentiation (Extended Data 


CD14 | tdTomato (Krt5-Cre traced) 


cells (right). d, Schematic depicting orthotopic cell transplantation 
methodology. e, Transplantation of LNEPs combined from eGFP- or 
tdTomato-expressing donors into a single recipient. Most engrafted regions are 
exclusively GFP™ (green) or tdTomato* (red), suggesting clonal expansion. 
f-h, FACS isolation and transplantation of B4* CD200* CD14* LNEPs (f). 
Transplanted cells differentiate into both SPC* (g) and Krt5~ (h) cells, 
representative of n = 3 transplants. i-k, FACS isolation and transplantation of 
Krt5-CreERT2-labelled LNEPs also differentiate into SPC* (j) and Krt5* (k) 
cells, representative of n = 2 transplants. Scale bars, 20 um (c, left, g, h, j, k), 
10 um (¢, right) and 100 tm (e). 


Fig. 6b-d). Notch antagonism in vivo via intranasal delivery of diben- 
zazepine (DBZ; in conjunction with dexamethasone and IBMX) resulted 
ina significant increase in the number of cyst-derived SPC” cells (12.3% 
versus 1.6%) (Fig. 3j-l). 

Persistent cysts bear a strong resemblance to micro-honeycombing 
in the lungs of patients with idiopathic pulmonary fibrosis (IPF) (Fig. 4c, d). 
These lungs (n = 10) showed almost all cystic epithelia were comprised 
of KRT5” cells surrounded by either additional metaplastic KRT5~ 
cells or pseudostratified epithelium with ectopic but otherwise typical 
basal cells’”. Distinct foci of hyperplastic SPC™ cells were also present. 
Notch activity correlated with KRT5* cysts but was absent in most 
hyperplastic SPC" cells (Fig. 4d-f and Extended Data Fig. 9a—d) and in 
normal alveolar regions (Extended Data Fig. 9k). 

In lungs from patients with scleroderma (n = 7), fibrotic areas dis- 
played the IPF pattern of persistent KRTS* and HES1 “ cystic structures 
(Extended Data Fig. 9e, g, h). However, in three less-fibrotic specimens, 
we observed extensive double-positive KRT5* and SPC™ cells lining 
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Figure 3 | Activation and Krt5 expression by lineage-negative progenitors is 
Notch-dependent. a-d, LNEP colonies upregulate Krt5 only after stimulation 
with BALF from influenza-injured mice (b) (m = 6 experiments), a process 
blocked by y-secretase inhibition with DAPT (c) (n = 4 experiments), 
quantified in d. e, f, Hes1 (e) and Notch] intracellular domain (ICD) (f) are 
present in the nucleus of Krt5* cells at day 11 indicating Notch activity. 

g, y-secretase inhibition during influenza injury reduces Krt5* cell activation 
and expansion as measured by the fraction of lung section area. Each 

dot denotes one section; two sections per mouse, n = 5 (vehicle) or 4 (DAPT) 
mice per group. h, i, y-secretase inhibition induces SPC expression in LNEPs 
in vitro, quantified in i (n = 3 experiments). h, Bottom, the Krt5-CreERT2 
lineage label could be observed in SPC* cells after DAPT treatment in vitro. 
j-l, Representative images of Notch inhibition in vivo via intranasal 
administration of DBZ results in a significant increase in Krt5-CreERT2-traced 
SPC* cells (1) versus labelled cells in vehicle-treated mice (k) after influenza, 
quantified in j (n = 2 mice per group, >900 cells quantified per mouse in 
two sections from two separate lobes). IB denotes influenza BALF. 

Scale bars, 20 um. Data are mean + s.d. Source data for g and j are 

available online. 


alveoli (Fig. 4g and Extended Data Fig. 9f). Although the origin of this 
KRT5* expansion is uncertain in humans, we note ANp63’ KRT5~ 
cells in normal terminal airways (Extended Data Fig. 9i), analogous to 
LNEPs in mice. 

These experiments identify a rare, undifferentiated epithelial popu- 
lation that is the major responder in distal lung after severe damage 
(Extended Data Fig. 10a). Notch signalling modulates the quiescence, 
activation and differentiation state of murine LNEPs (Extended Data 
Fig. 10b), providing a signalling model to frame the dynamic aspects of 
LNEP function. The persistently abnormal parenchymal structures that 
derive from LNEPs after influenza infection represent a failed regen- 
erative process, promoted at least in part by ongoing Notch activity. 
The striking parallels to the currently inexplicable micro-honeycombing 
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Figure 4 | Persistent Notch activity promotes cystic honeycombing in both 
mouse and human. a, Krt5-CreERT2-traced (tdTomato*) cells develop into 
cystic structures at late time points after influenza. b, Cyst cells demonstrate 
nuclear expression of Hes1 indicative of persistent Notch signalling. c, Lung 
from patient with IPF bearing honeycomb cysts with mutually exclusive 
KRT5* and SPC* cells. d, IPF honeycomb cysts with nuclear HES] in KRTS* 
cells and surrounding epithelium, similar to mouse (b). e, SPCt type II cells in 
hyperplastic foci infrequently express HES1, quantified in f (n = 8 patients, 
mean + s.d.). g, Scleroderma lung demonstrating sub-pleural KRT5* and 
SPC* cell expansion with many KRT5* SPC* double-positive cells (right). 
Scale bars, 20 um (a, b, d, e, g) and 100 jm (c). 


that characterizes progressive fibrotic lung disease, including hyper- 
active Notch, suggest inappropriate Notch signalling may also be a major 
contributor to failed regeneration in chronic lung disease. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Animals. SPC-CreERT2 5 Krt5-CreERT2 
(Krt5'™ re/ERT2BINY | CC10-CreERT2 (Scgblal'™r/RY FN), FoxJ1-CreERT2 
(Tgfost-crer ERT2)18Ib): and Cp-eGFP (Tg(Cp-EGFP)25Gaia) mice are previously 
described’*'"**°, All of these strains were bred to either mTmG 
(Gt(ROSA)26Sor'™*ACTE taTomato,-EGEP)Luoy21 or Ail4-tdTomato (Gt(ROSA) 
26Sor'™ (CAG teTomato)H2e)22 mice to generate mice expressing a fluorophore in Cre- 
expressing cells. SPC-CreERT2, Krt5-CreERT2 and CC10-CreERT2 mice were 
developed in a 129 background and backcrossed to C57BL6 for at least three gen- 
erations. For transplant experiments, mITmG and/or Ub-GFP (Tg(UBC-GFP) 
30Scha)”* were used for donor cells. For all experiments, 6-8-week-old animals 
of both sexes were used in equal proportions. Investigators were not blinded to 
mouse identity. All studies were approved by University of California, San Fran- 
cisco (UCSF) Institutional Animal Care and Use Committees (IACUC), protocol 
AN088356-03. All animal studies used a minimum of three mice per group with 
the exception of DBZ (see below). 

For lineage analysis of the cell of origin of Krt5* cells, mice were administered 

three doses (Krt5-CreERT2) or five doses (SPC-CreERT2 and CC10-CreERT2) of 
0.25 mgg | body weight tamoxifen in 50 pil corn oil. A chase period of >21 days 
was used to insure the absence of residual tamoxifen before injury. 
Injury (influenza, bleomycin). Mice were administered 280 focus-forming units 
(FFU) of influenza A/H1N1/Puerto Rico/8/34 (PR8) intranasally. PR8 virus dis- 
solved in 30 il PBS was pipetted onto the nostrils of heavily anaesthetized mice (visual 
confirmation of agonal breathing), whereupon mice aspirated the fluid directly into 
their lungs. The mice were allowed to recover and weighed twice a week. For ex- 
periments analysing the lineage fate of Krt5~ cells, a single dose of 0.125 mg¢g ' 
body weight tamoxifen was administered at day 10 after PR8 infection. 

Infective viral particles were assayed by inoculation of either stock virus or homo- 
genate (in 1 ml PBS) of left lung, spleen and brain onto 96-well plates of confluent 
MDCK cells. After 1 h, samples were decanted and replaced with serum-free media 
containing modified trypsin (L-(tosylamido-2-phenyl) ethyl chloromethyl ketone 
(TPCK)-treated) at 100 pig ml ~ | Fifteen hours later, the cells were fixed in 100% meth- 
anol, and then subjected to indirect immunocytochemistry using Millipore mouse 
anti-influenza A (MAB 8257) at 1.25 1g ml ', followed by Vector 102 biotinylated 
horse anti-mouse, and the biotin/avidin system (PK-4002) with diaminobenzi- 
dine as a chromogen. Samples were processed in triplicate over dilutions, and foci 
were counted in wells that yielded 30-100 discrete foci. 

Bleomycin (1.7 U kg” ' body weight) was administered intratracheally. Mice were 
weighed twice a week. For lineage tracing Krt5* cells after bleomycin, a single dose 
of 0.125 mg g _' body weight tamoxifen was administered at day 17 after bleomycin. 
Treatment of animals with y-secretase inhibitors. For DAPT administration, mice 
received 50 mg kg ' body weight DAPT in 20 tl dimethylsulphoxide (DMSO) per 
intraperitoneal injection, for the indicated periods. For intranasal DBZ admin- 
istration, 30 pmol kg’ body weight DBZ was suspended in 50 1 sterile PBS and 
sonicated in a Bioruptor UCD-200 for 15 min total, 30 s intervals on ice. In DBZ 
experiments, both DBZ and vehicle group also received 2.5 gg ' body weight 
dexamethasone (Sigma) in the intranasal solution and 10 mgkg~ ' IBMX (Sigma) 
intraperitoneally daily. Both DAPT and DBZ were obtained from Toronto Research 
Chemicals. More than 7,500 Krt5-CreERT2-labelled cells were quantified for SPC 
expression in at least two individual lobes from each mouse. 

Orthotopic lung transplantation. Left lung transplants were carried out using the 
method described previously**. The donor animal was anaesthetized and injected 
with heparin (50 U) immediately before perfusion of the lung vasculature with 5 ml 
of ice-cold Perfadex solution (Xvivo Perfusion), clamping of the hilar structures, 
and removal from the donor animal. The left lung was transplanted into the recip- 
ient animal using the cuff anastomosis technique. 

Orthotopic tracheal transplantation. The donor animal was anaesthetized and 
with the aid of microscopic dissection, a segment of trachea composed of 5-7 tra- 
cheal rings was removed. The recipient animal was anaesthetized and the donor 
trachea was interposed using proximal and distal anastomoses’. 
Immunofluorescence analysis of tissue. After euthanasia, lungs were either 
immediately inflated with OCT and flash frozen or inflated with 4% paraformalde- 
hyde (PFA) and fixed for 1 h at room temperature and subsequently embedded in 
OCT. Sections (7 jum) were cut on a cryostat, with fresh-frozen tissue immediately 
fixed for 5 min in 4% paraformaldehyde at room temperature. All sections were sub- 
sequently incubated for 3 X 10-min intervals with 1 mg ml”! sodium borohydride 
(Sigma) in PBS to reduce aldehyde-induced background fluorescence. Slides were 
subsequently blocked =1 h in PBS plus 1% bovine serum albumin (Affymetrix), 5% 
nonimmune horse serum (UCSF Cell Culture Facility), 0.1% Triton X-100 (Sigma) 
and 0.02% sodium azide (Sigma). Slides were incubated overnight in primary anti- 
bodies listed below, diluted in block solution. Slides were washed three times with 
PBS plus 0.1% Tween 20, and incubated with secondary antibodies (typically Alexa 
Fluor conjugates, Life Sciences) at a 1:2,000 dilution =1 h. Finally, slides were again 


(Sftpc’™ 1 (cre/ERT2,rtTA)Hapy 


washed, incubated with 1 1M DAPI for 5 min, and mounted using Prolong Gold 
(Life Sciences). 

The following antibodies were used: rabbit anti-proSPC (1:3,000; Millipore, 
AB3786), goat anti-proSPC (1:2,000; Santa Cruz, M-20), goat anti-CC10 (1:10,000, 
a gift from B. Stripp), rabbit anti-Krt5 (1:1000; Covance, PRB-160P), chicken anti- 
Krt5 (1:1,000; Covance, SIG-3475), rabbit anti- ANp63 (1:100; Biolegend, POLY6190), 
rat anti-CD45 (1:200, BD 30-F11), sheep anti-eGFP (1:500; Pierce, 10396164), rabbit 
anti-phospho histone H3 (1:500; Millipore, 06-570), rabbit anti- Hes1 (1:1,000; Cell 
Signaling, D6P2U), rabbit anti-activated Notch1 (1:1,000; Abcam, ab8925), mouse 
anti-acetylated tubulin (1:500, Sigma, 6-11B-1). 

Quantification of lineage tracing. Samples were prepared for immunofluores- 
cence staining. Quantification at day 11 after influenza is the result of counting 
>2,900 cells (CC10 trace), >4,000 cells (SPC trace), or >1,300 (Krt5 trace) from at 
least three mice per genotype. Cells were counted from over five sections per mouse 
and included at least three individual lobes. Mutual exclusivity of CC10-traced and 
Krt5* cells at days 7-8 was determined with a smaller sample size, n = 2 mice, 12 
Krt5* airways, >500 cells examined. Only mice possessing the appropriate geno- 
type were used in studies. 

Epithelial cell isolation and flow cytometry. Lung epithelial cells were isolated as 
previously described’*, with the following modifications. After installation with 
agarose and subsequent hardening by a brief incubation on ice, each lobe was cut 
away from the mainstem bronchi. The proximal-most quarter of each lobe sur- 
rounding the bronchi was then cut away to minimize the inclusion of basal cells in 
the cell preparation, and the previous protocol was followed from this point on. 

For FACS analysis, single-cell preparations were incubated for 30-45 min at 4 °C 
with the following primary antibodies: phycoerythrin (PE), Alexa Fluor 488, or 
BV421-conjugated rat anti-mouse EpCAM (1:500; Biolegend, G8.8), Alexa Fluor 
647 or PE-conjugated rat anti-mouse integrin B,4 (1:75; BD, 450-9D), Alexa 647- 
conjugated CD200 (1:100, Biolegend, OX-90), and PE/Cy7-conjugated CD14 (1:100, 
Biolegend, Sal4-2). Antibody incubations were done in DMEM (without phenol 
red) plus 2% FBS, and cells were washed twice with PBS after antibody incubations. 
Sorting and analysis was performed on BD FACS Aria cytometers. 

Orthotopic cell transplantation. Recipient C57BL/6 mice were infected with PR8 
(see Animals). At 9 days after infection, donor cells were sorted from mTmG or 
Ub-GFP mice (Animals) and resuspended in 50 il sterile PBS. Recipient mice re- 
ceived cell solution intranasally as described above for influenza administration. The 
total number of B,* cells ranged from 150,000 to 350,000 per transplant (n = 6), 
and equivalent numbers of 8, cells were always transplanted into injured litter- 
mates for comparison. For transplantation of Krt5-CreERT2-labelled cells, 1,000 
cells were transplanted per recipient (n = 2). For By” CD14* CD200™ cell trans- 
plants, 3,000-10,000 cells were transplanted per mouse (n = 3). FoxJ1-CreERT2- 
labelled or CC10-CreERT2-labelled cell transplants were performed in n = 3 or 4 
mice each, respectively (1 X 10°-3 X 10° cells per mouse). Endpoint analysis was 
performed at day 21 after infection unless otherwise noted. For analysis of pro- 
liferation, recipient mice were administered 50 mg kg‘ body weight Edu (Santa 
Cruz) in PBS daily. Edu was detected with Click-iT EdU Alexa Fluor 488 Imaging 
Kit (Invitrogen). 

Primary cell culture. Isolated primary lung epithelial cells were plated and cultured 
on Matrigel as follows. Eight-well chamber slides were coated with 150 ,1l Matrigel 
per well, allowed to solidify at 37 °C, and then equilibrated with SABM (Lonza) for 
at least 30 min before cell plating. A total of 15,000-40,000 cells were plated in each 
well of and maintained in ‘baseline’ media consisting of SAGM (Lonza) supple- 
mented with 5% charcoal-stripped FBS and 10 ng ml ' KGF (FGE-7, Peprotech). 
Other growth factors were included in the media only when indicated and are sum- 
marized in Supplementary Table 2. 

BALF was collected from injured animals for cell culture as follows. Euthanized 
mice were intratracheally intubated before cardiac perfusion and 1 ml of baseline 
media was lavaged. The lungs were repeatedly lavaged with the media at least three 
times. BALF was then centrifuged three times for 5-min spins at 1,500g to remove 
the cells and other debris. Clarified BALF was finally filtered through a 0.25-1m 
Spin-X filter (Sigma) to remove any additional debris and to ensure a cell-free pre- 
paration. BALF prepared in this way was either added to cells immediately or fro- 
zen in aliquots at —80 °C and added to cultured cells without dilution. 
Long-term cell culture. Cells isolated as above were maintained in SAGM as above, 
with the addition of 10 4M Y-27632 (Sigma) and 50 ng ml’ murine noggin (Pepro- 
tech). Cells were passaged every 7-10 days by initial incubation with 25 U ml“! 
dispase at 37 °C for 20 min to liberate colonies. Single-cell dissociation was per- 
formed by additional 10-min incubation with 2 mM EDTA in PBS in combination 
with mechanical disaggregation by pipetting. 
y-secretase treatment of LNEPs in vitro. LNEPs maintained as above were dis- 
sociated and re-plated directly into SAGM baseline media with added DAPT or 
GSI-X (Calbiochem) at 40 or 20 1M concentrations (unless otherwise indicated). 
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For SPC induction experiments, IBMX was added when indicated. LNEPs were 
cultured for 7-10 days and then analysed by immunofluorescent staining. 
Immunofluorescence analysis of cultured cells. Cells grown on matrigel were 
fixed for 5-10 min in IHC Zinc Fixative (BD) and subsequently stained as indicated 
above, except that all staining solutions were prepared with TBS as the zinc fixative 
reacts with phosphate. 

Live slice imaging. Krt5-CreERT2/tdTomato mice were administered 280 FFU 
PR8 (as above) and received a single 0.25 mgkg ' dose of tamoxifen 24h before 
culling at the indicated time points. Injured mice were euthanized and perfused and 
lavaged with PBS. Lungs were instilled with 2% low-melting point agarose and 
~300-1m slices were prepared on a vibratome. Lung slices were maintained in 
SAGM plus 10 ng ml’ KGF during imaging with the addition of 500nM hydro- 
xytamoxifen (Sigma) to induce recombination in all Krt5-expressing cells. Slices 
were imaged continuously for 12 h in a 37 °C chamber on an inverted stage with a 
Leica SP5 confocal microscope. Images obtained were deconvoluted with Bitplane 
Imaris for presentation. 

Quantitative reverse transcriptase PCR. RNA was isolated from sorted cells 
using the Promega RNA Reliprep kit. CDNA was synthesized and amplified using 
the Ovation PicoSL WTA V2 kit (NuGen). Reverse transcription PCR (RT-PCR) 
reactions were performed using Faststart Universal SYBR green Master Mix (Roche) 
and run on an Eppendorf Realplex* thermocycler. Primer sequences are as listed: 
SPC (also knownas Sftpc), forward, 5'-ATGGACATGAGTAGCAAAGAGGT-3’, 
reverse 5'-CACGATGAGAAGGCGTTTGAG-3’; CC10 (also known as Scgb1a1), 
forward, 5’- ATGAAGATCGCCATCACAATCAC-3’, reverse 5’-GGATGCCAC 
ATAACCAGACTCT-3’; Krt5 forward, 5'-TCCAGTGTGTCCTTCCGAAGT-3’, 
reverse 5'-TGCCTCCGCCAGAACTGTA-3’; ANp63, forward, 5'’-ATGTTGTA 
CCTGGAAAACAATGCC-3’, reverse, 5’-CAGGCATGGCACGGATAAC-3’; Jagl 
forward, 5'-CCTCGGGTCAGTTTGAGCTG-3’, reverse, 5’-CCTTGAGGCAC 
ACTTTGAAGTA-3’; Jag2, forward, 5'-CAATGACACCACTCCAGATGAG-3’, 
reverse, 5'-GGCCAAAGAAGTCGTTGCG-3’; Hey1 forward, 5'-GCGCGGAC 
GAGAATGGAAAA-3’, reverse, 5’-TCAGGTGATCCACAGTCATCTG-3’; Aqp5, 
forward, 5'-AGAAGGAGGTGTGTTCAGTTGC-3’, reverse, 5’-GCCAGAGTA 
ATGGCCGGAT-3'; Abca3 forward, 5’-GCTTGAAGATCCAGTCGGAGA-3’, 
reverse 5'- CATAGCGAATGTAGTCCTCGAAG-3’; Cgrp (also known as Calca), 
forward, 5'-GCGGGCTCTAGCTTGGACAG-3’, reverse, 5'-AAGGTGTGAAA 
CTTGTTGAGGT-3’; Sox2, forward, 5'-GCGGAGTGGAAACTTTTGTTC-3’, 
reverse, 5'-GGGAAGCGTGTACTTATCCTTCT-3’; Foxj1, forward, 5’-CATC 
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TACAAGTGGATCACGGAC-3’, reverse, 5’-GAGCAGGCGCTCTGCGTAC 
TG-3’. 

Single-cell RNA-segq. Distal lung epithelial cells were isolated and FACS sorted as 
described above from CC10-CreERT2/mTmG mice. In addition, tdTomato* cells 
were sorted from tamoxifen-treated Krt5-CreERT2/tdTomato mice. Sorted single 
cells were captured ona medium-sized (10-17 tm cell diameter) microfluidic RNA- 
seq chip (Fluidigm) using the Fluidigm C1 system. All downstream steps (lysis, 
cDNA synthesis/amplification, library preparation, sequencing and raw data pro- 
cessing) were performed exactly as previously described’®. Fragment per kilobase 
of exon per million fragments mapped (FPKM) files for each cell were analysed 
using Fluidigm Singular software running in R. 

Human tissues. All human tissue samples were obtained from UCSF Interstitial 
Lung Disease Blood and Tissue Repository and are classified as Non-identifiable 
Otherwise Discarded Human Tissues. 

Statistics. For calculations involving single cell RNA-seq, Fluidigm Singular soft- 
ware running in R was used. All other statistical calculations were performed using 
Graphpad Prism. P values were calculated from two-tailed t-tests (paired or unpaired 
depending on experimental design) or ANOVA for multivariate comparisons. Var- 
iance was analysed at the time of t-test analysis. This data are not included in the 
manuscript but is available upon request. No statistical method was used to pre- 
determine sample size. 
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Extended Data Figure 1 | Characterization of influenza-induced Krt5* 
cells. a-c, Alveolar (a, b) and airway (c) Krt5* cells strongly express {4 after 
influenza injury. d, FACS plot of epithelial (EpCAM ) cells from tamoxifen- 
treated Krt5-CreERT2/tdTomato mice at day 15 after influenza, demonstrating 
B,4 expression in nearly all traced (tdTomato*) cells. e, £, Most Krt5* cells co- 
express ANp63 (e) and Krt14 (f). g, h, Expanded Krt5* cells are invariably 
associated with abundant CD45* inflammatory cells (g) and few if any 


d11 
trachea 
transplant 


remaining normal E-cadherin” epithelial cells other than the Krt5* cells 
themselves (h). i, Krt5* cells are unlabelled in SPC-CreERT2/mTmG mice. 
Inset in i demonstrates appropriate labelling of type II cells in an uninjured 
region of the same lung. j, k, Krt5* cells are not fluorescent after trachea 
transplantation from tdTomato donor. Basal cells in transplanted section of 
trachea retained fluorescence (j, inset in k). Scale bars, 100 jum (a) and 

20 um (b, c, e-k). 
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Extended Data Figure 2 | Influenza-induced Krt5* cells arise in both 
airways and alveoli and migrate across, around and through airway and 
parenchymal tissue. a, b, Krt5~ cells are detected in alveoli as early as day5 
and are found in larger clusters over time. ¢, d, Krt5* cells similarly arise in 
airways in greater abundance with time. e, Distinct alveolar and airway 
expansion is apparent 11 days after infection. f, Freeze-frames of live imaging 
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Krt5-Cre 


from a Krt5-CreERT2/tdTomato mouse 11 days after influenza, in which 
tdTomato” cells migrate from their original location (white box) outward. 
See Supplementary Video 1. g, Freeze-frames from a small airway in the same 
mouse; arrow denotes a single cell crossing the basement membrane. 

See Supplementary Video 2. Scale bars, 20 1m (a, b, g) and 100 um (c¢, d, f). 
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Extended Data Figure 3 | Characterization of bleomycin-induced Krt5* e, Lineage tracing of bleomycin-injured Krt5-CreERT2 mice reveal traced 
cells. a-c, By" Krt5* cells also arise after bleomycin injury and express ANp63_—_ (tdTomato *) type II cells expressing SPC and cells morphologically resembling 
(b, c). d, Western blotting demonstrating more pronounced and reproducible __ type I cells. In total, 31% of Krt5-CreERT2 traced cells express SPC by 

Krt5 induction after influenza injury at day 11 than after bleomycin injury day 50 after bleomycin (n = 3 mice, 264 Krt5-CreERT2-labelled cells counted). 
at day 17. Each lane was loaded with whole-lung lysate from a single mouse; Scale bars, 100 {tm (a) and 20 pm (Db, ¢, e). Full western blot scan in d is 
average percentage lung area corresponding to a band in influenza-injured available as Supplementary Fig. 1. 

mice is 3.6 + 0.5% (n = 13 mice quantified, see Fig. 3g as an example). 
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d22 influenza 


Extended Data Figure 4 | Krt5* cells do not arise from CC10-expressing 
progenitors but rather upregulate CC10 during expansion. a, Krt5~ cells 
express detectable levels of CC10 (top) compared to isotype control (bottom) in 
alveolar clusters (a). b, Representative image of CC10-CreERT2 lineage trace in 
which waiting only 7 days after tamoxifen administration before influenza 
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injury results in significant labelling of Krt5* cells (quantified in Fig. 1d). 
c, Strong CC10 expression in Krt5-CreERT2-traced (tdTomato *) cells by 
day 22 after influenza. For comparison, see single channel images (c, right 
and bottom) of the same region. Scale bars, 20 jim. 
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tdTomato (CC10-Cre traced) 


Extended Data Figure 5 | Heterogeneity of the LNEP-containing CC10— 
B,* population. a, Rare Krt5-CreERT2-traced (tdTomato*) cells were 
observed in uninjured distal lung airways that lacked Krt5 staining compared to 
trachea basal cells (inset) in the same section. All distal tdTomato * cells express 
ANp63 but most ANp63" cells are untraced (see Fig. 2c). b, Cytospins of sorted 
CC10~ B," cells reveal the presence of abundant multiciliated cells (green, 
acetylated tubulin™) anda small fraction of ANp63* cells (red). c, Quantitative 
reverse transcriptase PCR (qRT-PCR) analysis of mature lineage genes and 
genes of interest in all populations. n = 3 biological replicates; data are 


mean + s.d. d, Principal component analysis plot of cells sequenced in 

Fig. 2b, demonstrating that p63~ cells in the CC10~ B,* population (outlined, 
asterisk) cluster with multi-ciliated cells. e, CD200 is not expressed by 
FoxJ1-CreERT2-labelled multi-ciliated cells, highlighting its use in excluding 
such cells. f, Cytospin of Foxj1-CreERT2-labelled 84" cells demonstrating 
reliable selection for multi-ciliated cells (198 cells quantified). g, Gating on 
CD14 expression within the EPCAM™ By* CD200* population excludes 
CC10-expressing club cells. Scale bars, 20 um. 
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Extended Data Figure 6 | Orthotopic transplantation of LNEPs reveals 
their multipotency and differentiation appropriate to the local 
microenvironment. a, Several distinct areas of LNEP engraftment (red) reflect 
differentiation in response to location. Left dashed box demonstrates SPC 
expression in engrafted cells with nearby endogenous SPC-expressing cells 
(white); far right dashed box demonstrates Krt5 expression in engrafted cells 
and nearby endogenous Krt5-expressing cells (green). b, c, Cells in regions 
of SPC" differentiation (b) lack Hes1 expression (right), whereas those in areas 
of Krt5* differentiation (c) strongly express Hes! (right). d, Distinct areas of 
LNEP engraftment demonstrate an inverse relationship between SPC 
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serial section 


FoxJd1-Cre+ donor 


SPC 
se 


expression (left) and Hes1 expression (right) in probable single clones. 

e, Examination of transplanted cells 5 days after engraftment demonstrate 
abundant Edu incorporation (see Methods) indicative of proliferation. At this 
time point cells can be identified co-expressing B4 and SPC (right, circled). 
fg, Krt5* cells and CC10* cells were often found clustered in single regions of 
engraftment. h, Many engrafted cells in Fig. 2e are also SPC positive. i, By 
type II cells engraft in small clusters and only express SPC. j, k, CC10~ cells 
engraft but do not express SPC, CC10 or Krt5. 1, Multi-ciliated cells engraft 
but only persist as isolated single cells, losing acetylated tubulin expression. 
Scale bars, 100 tm (a) and 20 pm (b-1). 
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Extended Data Figure 7 | Transplantation of Ba* cbD14* CD200* cells (b) and club cells (c). d, e, Transplantation of rare Krt5-CreERT2-traced 
and Krt5-CreERT2-traced cells recapitulates multipotency of the cells from uninjured mice resulting in donor-derived Krt5* cell expansion 
heterogenous CC10~ B4* population. a, Single channels images from Fig.2h —_indistinguishable from endogenous expansion. Images in d and e are 
demonstrate Krt5 expression in transplanted B,* CD14* CD200* cells. representative images from four attempted transplants, two of which exhibited 


b, c, Transplanted By* CD14* CD200* can also differentiate towards type II engraftment in two or four individual lobes. Scale bars, 20 um. 
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Extended Data Figure 8 | Notch activity in normal and injured lung. 
a, Uninjured Notch reporter mice (Cp-eGFP) show dim GFP in small airways 
and no detectable GFP in alveoli. b, Krt5~ cells arising in distal airways express 
GFP in Notch reporter mice 7 days after influenza infection. c, d, Some 
Krt5* cells persist within Krt5-CreERT2-labelled (tdTomato~) cysts (d) long- 
term (day 88) after influenza injury, and many traced cells express CC10 (c). 
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e, Cysts rarely contain sPc* type II cells (arrows). f, g, Hes1 expression is 
maintained in Krt5-CreERT2-traced (GFP™) cyst cells 98 days after influenza 
(f) but is absent in normal alveolar parenchyma from the same mice (g). 

h, Representative images of Krt5* cell expansion in vehicle- (left) or 

DAPT- (right) treated mice at day 11 after influenza, quantified in Fig. 3g. 
Scale bars, 20 um (a-g) and 100 jm (h). 


©2015 Macmillan Publishers Limited. All rights reserved 


RESEARCH 


diseased human lung 
a 


IPE 
patient 1 


IPF 
(insets of 
Fig. 4C) 


scleroderma 


* patient 2 


normal human lung 
i 


Extended Data Figure 9 | IPF and scleroderma lungs both contain HES1* 
honeycomb cysts, but scleroderma lungs also possess SPC and KRT5 co- 
expressing cells. Normal human lungs contain putative LNEPs and lack HES1 
in alveoli. a~d, Honeycomb cysts in several IPF lungs; many KRT5” cells as 
well as surrounding cystic epithelium demonstrate strong nuclear HES! signal. 
e, Region of scleroderma honeycombing similar to IPF lung. f, Scleroderma 
subpleural alveolar region with type II cell hyperplasia demonstrating cells 
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co-expressing SPC and KRT5. g, h, Cystic epithelium in scleroderma lungs 
expresses HES1 as in IPF. i, KRT5 ANp63° cells (white outlines) distinct 
from KRT5* ANp63* basal cells (red outlines) are present in distal airways. 
j, k, HES] staining is apparent in small airways of normal lung (j) but very 
low in alveolar parenchyma (k). All images are from patient samples in 
addition to those shown in Fig. 4. Scale bars, 20 jm (a-d, g-k) and 

100 um (e, f). 
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Extended Data Figure 10 | Hierarchical cellular responses to injury severity expansion and differentiation of LNEPs. Notch is required for activation and 
and Notch-regulated LNEP dynamics. a, Distinct epithelial cell types maintenance of LNEPs. Alveolar differentiation requires subsequent loss of 
contribute to regeneration depending on the severity of parenchymal injury. Notch activity, whereas persistent Notch results in either airway differentiation 
Examples of each are referenced. b, Notch signalling regulates the activation, | or abnormal cystic honeycombing. 
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JAPP-driven metabolic reprogramming induces 
regression of p53-deficient tumours in vivo 


Avinashnarayan Venkatanarayan’***, Payal Raulji*?, William Norton®, Deepavali Chakravarti®”**, Cristian Coarfa®, 
Xiaohua Su’**, Santosh K. Sandur!?*”, Marc S. Ramirez®, Jaehuk Lee®, Charles V. Kingsley’, Eliot F. Sananikone!?**, 
Kimal Rajapakshe®, Katherine Naff?, Jan Parker-Thornburg’, James A. Bankson®, Kenneth Y. Tsai?!°, Preethi H. Gunaratne"’ 


& Elsa R. Flores? 4 


TP53 is commonly altered in human cancer, and Tp53 reactivation 
suppresses tumours in vivo in mice’ (TP53 and Tp53 are also known 
as p53). This strategy has proven difficult to implement therapeut- 
ically, and here we examine an alternative strategy by manipulating 
the p53 family members, Tp63 and Tp73 (also known as p63 and p73, 
respectively). The acidic transactivation-domain-bearing (TA) iso- 
forms of p63 and p73 structurally and functionally resemble p53, 
whereas the AN isoforms (lacking the acidic transactivation domain) 
of p63 and p73 are frequently overexpressed in cancer and act prim- 
arily in a dominant-negative fashion against p53, TAp63 and TAp73 
to inhibit their tumour-suppressive functions* *. The p53 family inter- 
acts extensively in cellular processes that promote tumour suppres- 
sion, such as apoptosis and autophagy* “, thus a clear understanding 
of this interplay in cancer is needed to treat tumours with alterations 
in the p53 pathway. Here we show that deletion of the AN isoforms 
of p63 or p73 leads to metabolic reprogramming and regression of 
p53-deficient tumours through upregulation of IAPP, the gene that 
encodes amylin, a 37-amino-acid peptide co-secreted with insulin by 
the B cells of the pancreas. We found that [APP is causally involved 
in this tumour regression and that amylin functions through the 
calcitonin receptor (CalcR) and receptor activity modifying protein 3 
(RAMP3) to inhibit glycolysis and induce reactive oxygen species 
and apoptosis. Pramlintide, a synthetic analogue of amylin that is 
currently used to treat type 1 and type 2 diabetes, caused rapid tumour 
regression in p53-deficient thymic lymphomas, representing a novel 
strategy to target p53-deficient cancers. 

Using ANp63 (ref. 15) and ANp73 conditional knockout mice (Ex- 
tended Data Fig. la, b), we generated ANp63*’” and ANp73 ’~ mice 
(Extended Data Fig. 1c—f). To ask whether the AN isoforms of p63 and 
p73 act as oncogenes in vivo by interacting with p53, ANp63"’" ;p53-/~ 
and ANp73 ‘;p53 ’~ mice were aged for the development of thymic 
lymphomas, which form in nearly all p53" ’~ mice!®. We found a remark- 
able diminution in the number and size of thymic lymphomas in 
ANp63*’ ;p53-’~ and ANp73 ’ ;p53’~ mice, leading to an extended 
lifespan (Extended Data Fig. 2a—c) and suggesting that the AN isoforms 
of p63 and p73 restrain a tumour suppressive program that can com- 
pensate for p53 function. 

We found that TAp63 and TAp73 were upregulated in thymic lym- 
phomas from ANp63"’~ ;p53‘~ and ANp73 ‘~ ;p53’~ mice (Extended 
Data Fig. 2d, e) along with an upregulation of apoptosis (Extended Data 
Fig. 2f-j) and senescence (Extended Data Fig. 2k—-0). We also examined 
thymocytes from 4-week-old mice after treatment with 10 Gy gamma 
irradiation, a dose that is known to elicit p53-dependent apoptosis”’”. 


Indeed, TAp63 and TAp73 are higher in ANp63"’ ;p53“~ and ANp73 “; 
p53 ’~ thymocytes, which was further exacerbated after gamma irradi- 
ation (Extended Data Fig. 3a—c) with an increase in apoptosis (Extended 
Data Fig. 3d—-h) and senescence (Extended Data Fig. 3i-m). 

To determine whether TAp63 or TAp73 compensate for p53 function 
in tumours in vivo, we acutely removed ANp63 or ANp73 by intratumoral 
infection with adenovirus-Cre-mCherry (Extended Data Fig. 4a-d and 
Fig. la—f) in ANp63™";p53/~ and ANp73"";p53 ‘~ at 10 weeks of age. 
Tumours were 2.3-5.8 mm’ in size at the time of infection and moni- 
tored weekly by magnetic resonance imaging (MRI; Fig. la-i). Mice defi- 
cient for either ANp63 or ANp73 and p53 showed marked decreases in 
tumour burden (Fig. 1h, i). The reduction of ANp63 and ANp73 expres- 
sion resulted in increased expression of TAp63 and TAp73 (Fig. 1j-m 
and Extended Data 4d) and increased apoptosis (Extended Data Fig. 
4e-h) and senescence (Extended Data Fig. 4i-k). ANp63“”“;p53’~ and 
ANp73“4;p53 ~ mice also had an increased lifespan (Fig. In). We found 
differences in CD4/CD8-positive cells in young mice (4 weeks) (Extended 
Data Fig. 41—-p), indicating that changes in T-cell development may lead 
to a lower tumour incidence in double-mutant mice. Indeed, we found 
that p53 ’~ thymic lymphomas are composed primarily of CD4/CD8 
double-positive thymocytes while the ANp63“/4;p53‘~ and ANp73“™4; 
p53-’~ lymphomas contain very few CD4/CD8 double-positive thy- 
mocytes (Extended Data Fig. 4q-t). Lastly, we asked whether thymic 
stromal cells contribute to the apoptosis in the regressing lymphomas. 
We sorted CD45-positive cells to select for T lymphocytes in p53’, 
ANp63"";p53/~ and ANp73"";p53 ‘~ mice and infected them with 
adenovirus-Cre (Extended Data Fig. 4u). ANp63“4;p53 “and ANp73™4; 
p53 ‘~ thymocytes underwent apoptosis independent of the presence 
of the stromal cells (Extended Data Fig. 4v). These data indicate that 
inhibition of the AN isoforms of p63 and p73 serves to upregulate 
TAp63 and TAp73 to compensate for loss of p53 in tumour suppression. 

We found that the AN isoforms of p63 and p73 bind to the promo- 
ters of the TA isoforms of p63 and p73, suggesting that the AN isoforms 
of p63 and p73 can transcriptionally repress TAp63 and TAp73 tran- 
scription (Extended Data Fig. 5a-i). We also found that the increase in 
apoptosis and cellular senescence was dependent on TAp63 and TAp73 
(Extended Data Fig. 5j-q). 

We performed RNA sequencing of lymphomas after infection with 
Ad-mCherry (ANp63™";p53-/~ and ANp73";p53 ’~) and Ad-Cre- 
mCherry (ANp63“4;p53_‘~ and ANp73“/4;p53 ‘) and found that 
thymic lymphomas from mice deficient for p53 and ANp63 clustered 
with those from mice deficient for p53 and 4Np73 (Extended Data Fig. 6a). 
Ingenuity pathway analysis (IPA) (Fig. 1q) revealed genes involved in 
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Figure 1 | In vivo deletion of 4Np63 or 4Np73 in p53-deficient mice 
suppresses lymphomagenesis. a-f, Magnetic resonance imaging (MRI) of 
thymic lymphomas of indicated mice. Tumour volume (mm”) shown within 
each panel. UN-D, undetectable. Tumours indicated by the dashed yellow line. 
g-i, Quantification of the indicated thymic lymphomas, n = 4 mice. 

j-m, Quantitative real time PCR (qRT-PCR), n = 4, P< 0.005. n, Kaplan- 
Meier curve, n = 4, P< 0.005. Boxed numbers represent median survival. 

o, Ingenuity pathway analysis (IPA) of RNA sequencing from thymic 


metabolism including TP53-inducible glycolysis and apoptosis regu- 
lator (TIGAR)"*, and glutaminase 2 (GLS2)'°”°. While we found that 
TIGAR and GLS2 were upregulated in either ANp63“/4;p53-/~ or 
ANp73“4;p53~ thymic lymphomas, we identified a novel gene, islet 
amyloid polypeptide (IAPP) or amylin, which was upregulated by over 
fivefold in both double-mutant thymic lymphomas. IAPP limits glu- 
cose uptake, resulting in increased intracellular glucose-6-phosphate 
(G-6-P)” levels and decreased glycolysis”. We validated IAPP, TIGAR and 
GLS2 expression in thymic lymphomas derived from ANp63“4;p53-/~ 
and ANp73“4;p53’~ mice and found that IAPP is expressed at levels 
over twofold higher in double-mutant mice (Fig. 1p and Extended Data 
Fig. 6b-d). [APP and GLS2 expression depend on TAp63 and TAp73 
(Fig. 1q and Extended Data Fig. 6d). To determine whether TAp63 or 
TAp73 transcriptionally regulate APP, we performed chromatin immu- 
noprecipitation in mouse embryonic fibroblasts (MEFs; Extended Data 
Fig. 6e-g) and thymocytes (Fig. 11, s). We found that TAp63 and TAp73 
bind to sites located in the promoter (site 1), 1,756 nucleotides upstream 
of the transcriptional start site, and intron 2 (site 2) of JAPP, 706 nuc- 
leotides downstream of the transcriptional start site (Extended Data 
Fig. 6e-g). Because a greater binding affinity of TAp63 and TAp73 was 
detected in the promoter region (site 1) of [APP, we cloned this site into 
a luciferase reporter gene and also mutated this site (Extended Data 
Fig. 6h-k). Only the luciferase reporter gene containing wild-type IAPP 
promoter site 1 was transactivated by TAp63 and TAp73 whereas the 


IAPP Non-specific 
promoter 


Time (min) 
site 1 


lymphomas 48 h after infection with adenoviruses. Red oval indicates 
significantly upregulated metabolic genes. p, q, RT-PCR for IAPP in thymic 
lymphomas (p) or MEFs of the indicated genotypes using a non-targeting 
shRNA (shNT) or shRNAs for TAp63 (shTAp63) or TAp73 (shTAp73) (q), 
n= 4, P<0.005. r, s, GRT-PCR of IAPP promoter site 1 using chromatin 
immunoprecipitation, n = 3, P< 0.005. t, Cartoon showing transcriptional 
activation of IAPP by TAp63 and TAp73. u, Extracellular acidification rate 
(ECAR) as a measurement of glycolysis, P< 0.005. 


mutant version was not. Taken together, these data indicate that IAPP 
is a transcriptional target gene of TAp63 and TAp73 (Fig. 1t). 
Expression of [APP in p53 ’~ MEFs resulted in lowlevels of glycolysis 
comparable to that in ANp63 “~;p53/~ and ANp73 ’~;p53 “ MEFs 
(Extended Data Fig. 6l-m and Fig. 1u). Conversely, when we knocked 
down IAPPin ANp63 ’;p53’~ and ANp73 ‘ ;p53’~ MEFs, the levels 
of glycolysis were similar to that of p53’ MEFs (Fig. 1u) indicating that 
IAPP inhibits glycolysis. In vivo, we detected massive tumour regression 
in ANp63";p53/~ or ANp73";p53/~ thymic lymphomas treated 
with IAPP (Extended Data Fig. 7a and Fig. 2a, b, h, i, 0, p), P< 0.05. 
Conversely, in ANp63“4;p53 /~ and ANp73“4;p53 /~ thymic lym- 
phomas treated with Ad-shIAPP-mCherry the tumours continued to 
grow comparable to that of p53 ’~ thymic lymphomas (Fig. 2a-k, 0-1), 
P> 0.05 at 13 weeks. Additionally, p53 ’~ mice treated with Ad-IAPP 
had an extended tumour-free survival period compared to p53’ mice 
or ANp63“4;p53-/~ and ANp73“4;p53 ‘~ mice treated intratumorally 
with Ad-shIAPP-mCherry (Extended Data Fig. 7a, b), indicating that 
IAPP is a tumour suppressor gene and is causally involved in the 
in vivo effects seen upon inactivation of ANp63 or ANp73. Given that 
pramlintide, a synthetic analogue of amylin, is used to treat type I and 
type II diabetes”, we treated thymic lymphomas in ANp63™";p53°/~ 
and ANp73™ f ‘p53’ mice. Indeed, three-weekly intratumoral injections 
resulted in rapid tumour regression (Fig. 2e, 1, s), P< 0.005 at 13 weeks. 
This effect was exacerbated by systemic intravenous treatment with 
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Figure 2 | IAPP is causally involved in tumorigenesis suppression in 
p53-deficient thymic lymphomas. a—n, Thymic lymphomas were infected 
with adenovirus (Ad)-mCherry (a, h), Ad-[APP-mCherry (+IAPP) (b, i), 
Ad-shIAPP-mCherry (c, d, j, k), or treated with pramlintide intratumorally 
(IT) (e, 1) or intravenously (IV) (f, m), or with 2DG (g, n). Yellow dashed lines 
indicate tumour. Volume of tumour shown. UN-D, undetectable. 


pramilintide (Fig. 2f, m, tand Extended Data Fig. 7c-q), P< 0.005, similar to 
that seen in tumours treated with a known inhibitor of glycolysis, 2-deoxy- 
D-glucose (2DG; Fig. 2g, n, u). These data provide preclinical in vivo 
evidence that pramlintide can be used to effectively treat p53-deficient 
tumours. Using in vivo dynamic magnetic resonance spectroscopy to 
measure the conversion of hyperpolarized [1-'*C] pyruvate to lactate 
as a proxy of glycolysis within the tumours, we found a marked reduction 
in glycolysis in ANp63/p53 and ANp73/p53 double-deficient mice and after 
introducing IAPP into p53 ’~ thymic lymphomas similar to tumours 
treated with 2DG (Fig. 2v). ANp63““4;p53 /~ and ANp73“4;p53/~ thy- 
mic lymphomas infected with a short hairpin RNA for IAPP exhibited 
levels of glycolysis similar to those found in p53 ’~ thymiclymphomas 
(Fig. 2v). Pramlintide also inhibits glycolysis in tumours (Fig. 2v). 

IAPP has been shown to induce reactive oxygen species (ROS) and 
activate apoptosis”**. We found a marked increase in the levels of ROS 
and apoptosis in thymic lymphomas expressing IAPP or treated with 
pramlintide or 2DG, whereas neither ROS nor apoptosis occurred upon 
inactivation of IAPPin thymic lymphomas from ANp63“4;p53 ‘~ and 
ANp73“4;p53‘~ mice (Fig. 2w), indicating that upregulation of [APP 
inhibits glycolysis similarly to 2DG and leads to oxidative stress that 
triggers apoptosis. While high levels of ROS are not commonly triggered 
by inhibition of glycolysis, nutrient deprivation or excess can result in 
the accumulation of ROS. Additionally, cancer cells tightly regulate ROS 
by acquiring additional mutations and compensatory mechanisms often 
ensue and may be at play in the thymic lymphoma cells that acutely 
downregulate glycolysis by APP”. 
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o-u, Quantification of the indicated thymic lymphomas, n = 5 mice per 
group. Significance indicated by the asterisks, P< 0.005. v, Quantification of 
in vivo pyruvate to lactate conversion using dynamic magnetic resonance 
spectroscopy as a measurement of glycolysis, n = 3 mice, P< 0.005. 

w, Immunohistochemistry for reactive oxygen species (ROS) or cleaved 
caspase 3. Positive nuclei are brown. 


To extend our findings to human cancer where p53 is altered in the 
majority of cases, we analysed human cancer cell lines containing p53 
deletions or mutations. We used short interfering RNA (siRNA) to 
knockdown ANp63 or ANp73 in cells derived from a lung adenocarci- 
noma (H1299) (Fig. 3a). Downregulation of 4Np63 or ANp73 resulted 
in upregulation of TAp63, TAp73 and IAPP (Fig. 3a) and an increase in 
apoptosis and decrease in cell proliferation (Fig. 3b and Extended Data 
Fig. 8a-d). To ask whether IAPP can also inhibit glycolysis in human 
cancer cell lines, we transfected H1299 cells with siANp63, siANp73 or 
IAPP (Fig. 3a). Knockdown of ANp63 or ANp73 or expression of [APP 
resulted in an inhibition of glycolysis (Fig. 3c, d) and glucose uptake 
(Extended Data Fig. 8e, g), accumulation of ROS (Fig. 3d-f), and induc- 
tion of apoptosis (Fig. 3d, g, h). We inhibited ROS in these cells using 
N-acetyl-L-cysteine (NAC) and observed no apoptosis (Fig. 3d-h). Pre- 
vious studies have indicated that IAPP inhibits glycolysis by increasing 
intracellular G-6-P in turn leading to an inhibition of hexokinase”!”’. 
We measured the levels of intracellular G-6-P in H1299 cells and found 
that cells expressing high levels of IAPP (H1299-siANp63, H1299- 
siANp73, or H1299+IAPP) also had high levels of G-6-P while knock- 
down of [APP resulted in a diminution in G-6-P (Extended Data Fig. 8f, g). 
Overexpression of glucose hexokinase II (HKII) led to a rescue of the 
glycolytic capacity of H1299 cells expressing siANp63 or siANp73 to 
levels similar to those in parental H1299 cells (Fig. 3c—g). These results 
indicate that IAPP inhibits glycolysis through the inhibition of HKII. 
We found that treatment of H1299 cells with pramlintide led to similar 
effects on glycolysis and apoptosis (Fig. 3g-n). Taken together, these 
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Figure 3 | IAPP inhibits glycolysis and induces ROS and apoptosis in 


p53-deficient human cancer cell lines. a, Representative western blot analysis, 


n= 4. b, Immunofluorescence for apoptosis and 5’-ethynyl-2'-deoxyuridine 
(EdU) incorporation. c, Extracellular acidification rate as a measure of 
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panels c, e-h. e-h, Immunofluorescence and quantification for ROS (red) (e, f) 
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data demonstrate that [APP and pramlintide inhibit glycolysis through 
the inhibition of HKII. 

IAPP isa secreted protein and binds to the calcitonin receptor (CALCR) 
and RAMP3 (ref. 27). To determine whether IAPP functions through 
these receptors to inhibit glycolysis, secreted media from H1299 cells 
expressing siANp63 (siANp63™) or siANp73 (siANp73™), which con- 
tains secreted IAPP (Fig. 4a and Extended Data Fig. 9a, b), was added to 
H1299 cells resulting in inhibition of glycolysis (Fig. 4b) and induction 
of ROS and apoptosis (Fig. 4c, d). In contrast, when these media were 
used to treat H1299 cells with knockdown of CALCR or RAMP3, glycol- 
ysis was not inhibited and ROS and apoptosis were not induced (Fig. 4b-d), 
indicating that the CALCR and RAMP3 receptors are critical for [APP 
function. We also treated the H1299 cells with media from H1299 cells 
expressing siANp63 (siANp63™) or siANp73 (siANp73™) and an amylin 
inhibitor, which led to high levels of glycolysis (Extended Data Fig. 9c) 
and low levels of ROS and apoptosis (Fig. 4c, d). IAPP causes activa- 
tion of the NLRP3 inflammasome”, which has been shown to be anti- 
tumorigenic in certain cancers via IL-18 processing”. We blocked 
caspase-1 using an inhibitor and found that it prevented apoptosis of 
H1299 cells (Fig. 4d), demonstrating that pyroptosis may also be an 
important mechanism of action of IAPP. 

To demonstrate the importance of the calcitonin receptor in vivo, we 
treated p53’ mice with thymic lymphomas at 10 weeks of age with 
pramlintide and a calcitonin receptor inhibitor (Fig. 4e-m) and found 
that this inhibition rendered pramlintide ineffective, demonstrating the 
importance of the calcitonin receptor for IAPP/amylin/pramlintide 
function (Fig. 4n). To further determine the anti-tumorigenic efficacy of 


Figure 4 | Calcitonin and RAMP3 receptors are required for secreted [APP 
to suppress tumorigenesis. a, Cartoon depicting treatment of cells expressing 
the indicated siRNAs and treated with media from the cells secreting [APP 
on the left. b, Extracellular acidification rate (ECAR) in H1299 cells. 

c, d, Immunofluorescence for ROS (c) and apoptosis (d). e-m, MRI and 
quantification of thymic lymphomas treated with placebo (e, h, k), pramlintide 
(f, i, 1), or pramlintide plus calcitonin inhibitor (CalR I.) (g, j, m), m = 5 mice. 
n, Cartoon of [APP signalling through RAMP3 and calcitonin receptor 
(CALCR) to inhibit glycolysis and induce ROS and apoptosis. o, Kaplan-Meier 
curves from patients with p53 mutant tumours and co-expression of IAPP, 
RAMP3 and CALCR. Boxed numbers represent median survival. 
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pramlintide in cells with p53 deletions or mutations, we treated addi- 
tional human cancer cell lines*® with pramlintide anda calcitonin receptor 
inhibitor, resulting in increased glycolysis, decreased ROS and apop- 
tosis (Extended Data Fig. 9d-i). We assessed patient survival using data 
from the Cancer Genome Atlas (TCGA) of patients with p53 mutations 
and found that co-expression of IAPP, CALCR and RAMP3 correlated 
with better patient survival in basal breast cancer (Fig. 40), colorectal 
cancer and lung squamous cell carcinoma (Extended Data Fig. 9j, k). 

Reactivation of p53 activity in tumours results in tumour suppression’”. 
We have focused on interactions between the three p53 family members 
and have revealed a novel strategy to target p53-deficient and mutant 
cancers through amylin-based therapies like pramlintide. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Generation of ANp73 conditional knockout mice. The Cre-loxP strategy was 
used to generate the A4Np73 conditional knockout allele (4Np73fl). Genomic p73 
DNA from intron 3 to intron 3’ was amplified from BAC clone DNA (BAC RP23- 
186N8, Children’s Hospital Oakland Research Institute). loxP sites flanking exon 
3' of p73 and neomycin (neo) gene flanked by frt sites inserted in intron 3’ were 
cloned into pL253°'. Mouse embryonic stem cells (G4) electroporated with the tar- 
geting vector were analysed by Southern blot analysis for proper targeting of the 
ANp?73 allele. Resulting chimaeras were mated with C57BL/6 albino females and 
genotyped as described below. Mice with germ line transmission of the targeted allele 
(conditional, flox neo allele, fn) were crossed to the FLPeR mice to delete the neo 
cassette. Resulting progeny were intercrossed with Zp3-cre (C57BL/6)” transgenic 
mice. ANp7. i *;Zp3-cre females were mated with C57BL/6 males to generate 
ANp73‘’~ mice. The ANp73*’~ mice were intercrossed to generate ANp73 ‘~ 
mice. Compound mutant mice were generated by intercrossing the ANp63"’~ and 
ANp63" (ref. 15) and the ANp73 /~ and ANp73"" mice with the p53 mice'®. All 
procedures were approved by the [ACUC at University of Texas M.D. Anderson 
Cancer Center. 

Genotyping. Genomic DNA from tail biopsies was genotyped by Southern blot 
analysis by digesting genomic DNA with AflII and HindIII or by PCR using the 
following primers and annealing temperatures: (1) for wild type: wt-F, 5’-ACAGT 
CCTCTGCTTTCAGC-3’ and wt-R (fl-R), 5’-CACACAGCACTGGCCTTGC-3’, 
annealing temp: 58 °C, (2) for ANp73flox: fl-F, 5’-CATAGCCATGGGCTCTCCT-3’ 
and fl-R (wt-R), 5’-TGTCCTGCTGCTGGTTGTAT-3’, annealing temp: 63 °C, 
(3) ANp73floxneo: flneo-F, 5‘-GGGAGGATTGGGAAGACAAT-3’ and flneo-R, 
5'-TGTCCTGCTGCTGGTTGTAT-3’ annealing temp: 60 °C and (4) for ANp73KO: 
ko-F, 5’-CCTAGCCCAAGCATACTGGT-3’ and wt-R, 5’-TGTCCTGCTGCTG 
GTTGTAT-3’ annealing temp: 58 °C. Primers used to genotype for the Cre gene 
are as follows: Cre-F, 5'-TGGGCGGCATGGTGCAAGTT-3’ and Cre-R, 5’-CGG 
TGCTAACCAGCGTTTTC-3’, annealing temp: 60 °C. The primers for ANp63WT, 
ANp63KO, ANp63flox and p53 were previously described'*”*. 

Cell lines. Mouse embryonic fibroblasts (MEFs) for the indicated genotypes were 
generated as described previously’. Human lung adenocarcinoma cells (H1299), colo- 
rectal adenocarcinoma cells (SW-480) and breast adenocarcinoma cells (MDA-MB- 
468) were purchased from ATCC and cutaneous SCC cell lines (SRB12, COLO16)*° 
were a gift from K. Y. Tsai. The MEFs, SW-480 and MDA-MB-468 cells were cultured 
in DMEM (Cellgro) and H1299 cell lines were cultured in RPMI 1640 (Cellgro). The 
SRB12 and COLO16 cell lines were grown in DMEM/Ham’s F12 50/50 (Cellgro). All 
cell lines used in the study tested negative for mycoplasma. 
Immunohistochemistry. Mice thymic lymphomas or thymi were dissected, fixed 
in 10% formalin, and embedded in paraffin. Sections were de-waxed in xylene and 
re-hydrated using decreasing concentrations of ethanol. Antigens were unmasked 
in citrate buffer unmasking solution (Vector Laboratory) followed by incubation 
with blocking solution, and 18 h incubation at 4 °C with the following antibodies: 
cleaved caspase 3 (1:200) (Cell Signaling), PCNA (1:500) (Cell Signaling), malon- 
dialdehyde (1:50) (Abcam). Visualization was performed using the ImmPact DAB 
peroxidase substrate kit (SK4105, Vector Laboratories) and counter-stained with 
haematoxylin (H-3401, Vector Laboratories). The slides were mounted using Vecta- 
Mount (H-5000, Vector Laboratories). Images were acquired using a Zeiss Axio 
microscope and analysed with ProgRes Capture Pro 4.5 software. 
Senescence-associated [-galactosidase staining. Senescence-associated B-galacto- 
sidase staining on mouse thymic lymphoma was performed as described previously”? 
Quantitative real time PCR. Total RNA was prepared from MEFs or mouse tissues 
using TRIzol reagent (Invitrogen)****. Complementary DNA was synthesized from 
5 ug of total RNA using the SuperScript III First-Strand Synthesis Kit (Invitrogen) 
according to the manufacturer's protocol followed by qRT-PCR using the SYBR 
Fast qPCR master mix (Kapa Biosystems). (RT-PCR was performed using a ABI 
7500 Fast Real-time PCR machine. Primers for mouse TAp63, ANp63, PUMA, Noxa, 
bax, PML, p16 and p21 (refs 4, 34) and human TAp63, ANp63 and GAPDH were 
used as described previously***. Human primers for PUMA, Noxa, bax, PML, 
p16, p21 were used as described previously? and GLS2 and TIGAR as described 
previously’”. Mouse primers for TAp73 are F: 5’-GCACCTACTTTGACCTCCCC-3’, 
R: 5’‘-GCACTGCTGAGCAAATTGAAC-3’, ANp73 are F: 5'-ATGCTTTACGT 
CGGTGACCC-3’, R:5'-GCACTGCTGAGCAAATTGGAAC-3’, IAPPare F: 5'- C 
TCCAAACTGCCATCTGAGGG-3’, R: 5’-CGTTTGTCCATCTGAGGGTT-3’. 
Human primers used for TAp73 are F: 5‘-CAGACAGCACCTACTTCGACCTT-3’, 
R: 5'-CCGCCCACCACCTCATTA-3’ and for 4Np73 are F: 5'-TTCAGCCAGT 
TGACAGAACTAAG-3’, R: 5'-GGCCGTTTGTTGGCATTT-3’. 

Western blot analysis. Fifty micrograms of protein were electrophoresed ona 10% 
or 15% SDS PAGE and transferred to PVDF membraneas described previously****. 
Blots were probed with anti-p63 (1:500) (4A4, Santa Cruz), anti- TAp63 (1:1,000) 
(BioLegend), anti-TAp73 (1:500) (IMG-246, Imgenex), anti-p73 (mouse) (1:250) 
(IMG-259A, Imgenex), anti-p73 (1:1,000) (human) (EP436Y, Abcam), anti-p53 
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(WT) (1:1,000) (CM5, Vector Labs), anti-[APP (1:1,000) (ab103580, Abcam), anti- 
His (1:1,000) (G18, Santa Cruz), anti-hexokinase II (1:10,000) (C64G5, Cell Signal- 
ing), anti-calcitonin receptor (1:1,000) (ab11042, Abcam), RAMP3(1:1,000) (H125, 
Santa Cruz), and cleaved caspase 3 (1:1,000) (Asp 175, Cell Signaling), at 4°C for 
18 h followed by incubation for 1 h at room temperature with the appropriate sec- 
ondary antibodies conjugated to horseradish peroxidase (1:5,000) (Jackson Lab). 
B-Actin (Sigma 1:5,000) was used as a loading control. Detection was performed 
using the ECL Plus Kit (Amersham) following the manufacturer’s protocol and 
X-ray autoradiography. 

Characterization of thymus using flow cytometry. Thymi from 4-week-old mice 
and thymic lymphomas from 10-week-old mice were collected 48 h after adenovirus 
infection. Single cells were obtained by homogenizing the thymi through a 0.75-um 
filter. Cells were stained with CD3-PE (145-2C11), CD4-PerCP-Cy5.5 (RM4-5), 
CD8-APC (53-6.7), CD45-FITC (30-F11) (BDPharmingen), AnnexinV-Pacific Blue 
(A35122, Life Technologies), and 7-AAD (V35124, Invitrogen) and sorted using a 
BD Aria Cell Sorter or analysed using the LSR Fortessa Cell Analyzer and FlowJo 
software. 

Chromatin immunoprecipitation (ChIP). MEFs were grown to near confluence 
at passage 2 0n DMEM media with 10% serum as previously described’. Thymocytes 
from 6-week-old mice were collected 48 h after adenovirus infection. Cellular pro- 
teins were cross-linked to DNA using 1% formaldehyde and chromatin was pre- 
pared as described previously****. TAp63 and ANp63 ChIP analysis was performed 
using a pan-p63 antibody (4A4, Santa Cruz) as described previously and the TAp73 
ChIP was performed using a TAp73 antibody (ab14430, Abcam) and ANp73 ChIP 
was performed using a p73 antibody (IMG 259A, Imgenex). Putative TAp63 and 
TAp73 binding sites were scanned 3,000 bp upstream of the 5’ UTRand in intron 2 
of the IAPP gene. qRT-PCR was performed by using primers specific for the indi- 
cated regions of IAPP: Promoter-Site 1 (— 1802) forward: 5’-AGAGTTCAAGGT 
CATCCTCGAC-3’ and (—1731) reverse: 5’-TGTTCTGACATGCAGCCTCA-3’, 
Intron-2- Site 2 (+678) forward: 5’-AGACAGGCATGCTTAGAGACG-3’ and 
(+765) reverse: 5‘-CACTCAGTGTGGATGTCCGT-3’, and non-specific site (+7532) 
forward: 5'-GTGTGTGATGGTTTGGTGGAT-3’ and (+7623) reverse: 5’-AC 
AAGGCAGTTGATGGAGACT-3’. Similarly, putative ANp63 and ANp73 bind- 
ing sites were scanned 10,000 bp upstream of the 5’ UTRand in intron 1 of TAp63 
and TAp73. (RT-PCR was performed by using the primers specific for the indi- 
cated regions on the TAp63 promoter: Site 1 (—41) forward: 5'-CAGGAGCTCT 
CAAATCAAGTCAGA-3’ and (+37) reverse: 5’-ATCACAGAAGCCAGGACT 
TGTCAC-3’, and non-specific site (—3030) forward: 5’-GCTATAAATGTTTC 
CATGTGATGGATTGC-3’ and (—2973) reverse: 5'-TGCAGACTTAGCTATG 
GTCTCTTG-3’. Similarly, RT-PCR was performed using the primers specific 
for the indicated regions on the TAp73 promoter: Site 1 (— 1103) forward: 5'-CTA 
GCACACCAATCCAAGGAAAGA and (— 1059) reverse: 5’-GCCTGCAGTCC 
GGGTTT-3’ and non-specific site (—2488) forward: 5’ ACTAGACCTCTGTAC 
TTIGTGAACATACATTT-3’ and (—2382) reverse: 5’-GCACTCTCAFFATCCT 
GTAACAAAA-3’. 

Dual luciferase reporter assay. Luciferase assays were performed using p53’; 
p63 ‘and p53 ’ ;p73 ’~ MEFsas described previously*®. To generate the lucifer- 
ase reporter gene (pGL3-IAPP), the DNA fragment containing the TAp63/TAp73- 
binding site identified by ChIP was amplified from C57BL/6 genomic DNA by PCR 
with the following primers containing 5’ Xhol and 3’ HindIII cloning restriction 
enzyme sites: [APP 5'-ATACTCGAGGTGTTCAGGGAACCTTCGGT-3' (for- 
ward) and 5'-ATAAAGCTTCACCTGACCTCCAAACTCCC-3’ (reverse). Simi- 
larly, a mutant version of the luciferase reporter gene (pGL3-IAPP™) was generated 
using QuikChange Lightning (Agilent Technologies) following the manufacturer's 
instructions. The following primers 5'-TATTGTTCTGACATCCAGCCTGATG 
TTGCCCAGTCTGGT-3’ (forward) and 5’-ACCAGACTGGGCAACATCAGGC 
TGGATGTCAGAACAATA-3 (reverse) were used to generate the mutant version. 
Reverse transfection. Cells were transfected with 50nM siANp63 (SASI_Hs02 
00328367) (Mission siRNA, Sigma), siANp73 (SASI_Hs02_00326884) (Mission 
siRNA, Sigma), siT Ap63 (SASI_Hs01_00246771) (Mission siRNA, Sigma), siTAp73 
(SASI_Hs02_00339573) (Mission siRNA, Sigma), siRAMP3 (SASI_Hs01_00199036) 
(Mission siRNA, Sigma), siCalcitonin receptor (SASI_Hs01_00077738) (Mission 
siRNA, Sigma), sil APP (SASI_Hs01_00183962) (Mission siRNA, Sigma) or siNT 
(SIC_001) (Mission siRNA, Sigma) using Lipofectamine RNAiMAX (Invitrogen). 
The mixture of siRNA and Lipofectamine were combined together and added to 
the well followed by the addition of 200,000 cells per well in a six-well dish. 
Transfections and generation of IAPP- and hexokinase II-expressing cells. 
3 X 10° cells were plated in 10-cm dishes. MEFs and human cancer cells were trans- 
fected with 8 ug Myc-DDK-IAPP (RC215074) (Origene) or 3.3 ug HKII (Plasmid 
25529) (Addgene) using X-tremeGENE HP (Roche) and incubated for 48-60 h. Cells 
were selected with G418, MEFs (350 pig pl ') and human cancer cells (500 pig pl ') 
for a period of 9 days. 
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Secreted IAPP protein concentration. Twelve hours after knockdown of ANp63/ 
ANp73 in human cancer cells, fresh serum-free media was added to the cells. Fol- 
lowing a sixty-hour incubation, the media was collected and concentrated using 
Amicon Ultra-15 Centrifugal Filter Units (UFC901008, EMD Millipore). 

RNA sequencing and analysis. Five micrograms of polyA * RNA were used to con- 
struct RNA-Seq libraries using the standard Illumina protocol. Mouse mRNA 
sequencing yielded 30-40 million read pairs for each sample. The mouse mRNA- 
Seq reads were mapped using TopHat* onto the mouse genome and build UCSC 
mm9 (NCBI 37) and the RefSeq mouse genes. Gene expression and gene expres- 
sion differences were computed using Cufflinks**. For each species, a combined 
profile of all samples was computed; mRNA abundance was mean-centred and 
Z-score transformed for each mRNA individually. Principal component analysis 
was executed using the implementation within the R statistical analysis system. 
Hierarchical clustering of samples was executed by first computing the symmet- 
rical sample distance matrix using the Pearson correlation between mRNA profiles 
as a metric, supervised sample analysis was performed using the t-test statistics, 
and heat maps were generated using the heatmap.2 package in R. For gene signa- 
tures and pathway analysis gene list from the RNA-Seq comparing AN! "p53! ~ 
versus ANp63“4;953-/~ and ANp73“4;p53 /~ were obtained at a P value < 0.01. 
The genes upregulated in the ANp63“4;p53/~ and ANp73“4;p53 ’~ and down 
regulated in the AN“";p53~/~ were selected. The relative fold change of the genes 
were calculated and sorted from highest to lowest. Genes with a greater than 1.5- 
fold-increase were selected and run through the ingenuity pathway analysis (IPA) 
(Ingenuity Systems) to screen for pathways and processes. Genes from the selected 
pathways were cross-referenced with the Gene Set Enrichment (GSEA) (Broad 
Institute) data analysis, DAVID Bioinformatics Resource 6.7 and GSEA imple- 
mentation at the Molecular Signature Database (MSigD)’’. 

Magnetic resonance imaging. MRI imaging was performed at 10 weeks of age when 
the tumours were established and the volumes range from 2.3 mm” to 5 mm’. To 
reduce the variation between different groups of mice, a cohort of n = 5 with similar 
tumour volumes was established and tumours regression was monitored by MRI. 
All mice were scanned once a week for a period of 35 weeks on a 7-T, 30-cm bore 
BioSpec MRI system (Bruker Biospin Corp., Billerica, MA) . 

Hyperpolarized magnetic resonance spectroscopy. Dynamic MR spectroscopy 
(MRS) of hyperpolarized (HP) [1-'C]pyruvate was performed in vivo in tumour- 
bearing mice. To achieve polarization, a 26-mg sample of pyruvic acid (Sigma- 
Aldrich, St. Louis, MO) with 15 mM of OX063 radical (GE Healthcare, Waukesha, 
WI) and 1.5 mM Prohance (Bracco Diagnostics Inc., Monroe Township, NJ) was 
polarized in a HyperSense DNP system (Oxford Instruments, Abington, Oxfordshire, 
UK) as previously described***’. The frozen sample was dissolved in a 4 ml buffer 
containing 40 mM Tris, 80 mM NaOH, and 50 mM NaCl, resulting in a final isotonic 
and neutral solution containing 80 mM [1-'*C]pyruvate. A dual-tuned 'H/"?C linear 
radio frequency volume coil with 72 mm internal diameter (ID) was used in con- 
junction with imaging gradients with 12 cm ID. For anatomic imaging, the 'H chan- 
nel was used in transmit/receive mode. In addition to localizing scans, flow-weighted 
oblique gradient echo images (TE = 1.4 ms; TR = 55 ms; 90° excitation; 3 cm X 3 cm 
field-of-view (FOV) encoded over a 64 X 64 image matrix) were acquired to confirm 
that the slice prescription for '?C measurements would not be obfuscated by 
signals originating from within the heart. For carbon spectroscopy, the radio fre- 
quency volume coil was used in transmit-only mode in conjunction with a custom- 
built 15-mm ID 13C surface coil for signal reception. After dissolution, 200 il of the 
HP [1-'*C]pyruvate solution was administered to the animals via tail-vein cath- 
eter. A slice-selective pulse-acquire sequence (TR = 1,500 ms; 15° flip angle; 5 kHz 
spectral bandwidth; 2,048 spectral points; 8 mm oblique slab; 120 repetitions) was 
used for dynamic spectroscopy beginning approximately 15 s before injection. Data 
were processed to generate spectral time-courses of the HP-pyruvate and its lactate 
product. Spectra were phase adjusted and the area under the spectral peaks asso- 
ciated with [ 1-'°C]pyruvate and [1-!3C] lactate were integrated over time to reflect 
the overall signal observed from each metabolite over the course of the measure- 
ment. Total lactate signal, which could only arise from interaction of HP pyruvate 
with relevant metabolic enzymes, was normalized to the total signal from pyruvate. 
Glycolysis stress assay. Extra-cellular acidification rate (ECAR) was measured 
using the extracellular flux analyser (SeaHorse Bioscience XF96) following the man- 
ufacturer’s instructions. Forty-eight hours after transfection, the cells were plated at 
a density of 1.5 X 10* cells per well in the XF 96-well cell culture plates. Twenty-four 
hours after seeding, the culture medium was replaced with 180 pl of running medium 
and incubated for 1 h at 37 °C in a non-CO, incubator. Before calibration, 20 ul 
of 50 mM glucose, 11 .M oligomycin and 650 mM 2DG were aliquoted into each 
port in the sensor cartridge. ECAR was measured after the addition of glucose and 
oligomycin and before the addition of 2DG. Extra-cellular acidification rate was 
normalized to mpH min '. 

Glucose uptake measurement. Glucose uptake was calculated as a measure of glucose- 
dependent proton secretion from the maximum and basal glucose consumption 


after addition of 20 ul of 50 mM glucose and measured using the extracellular flux 
analyser (SeaHorse Biosciences XF96). 

Glucose-6-phosphate assay. Glucose-6-phosphate was measured using a glucose- 
6-phosphate assay kit (ab83426, Abcam) following the manufacturer’s instructions. 
Forty-eight hours after transfection, 2 10° cells were collected, homogenized and 
passed through a 10-kDa spin-column filter. The eluate was collected and glucose- 
6-phosphate enzyme and substrate reaction was performed for 30 min and absor- 
bance was measured at 450 nm. 

Proliferation assay. The transfected human cancer cells were plated at a density of 
5 X 10° cellsin6 replicates in a 96-well dish. Twelve hours later, the cells were labelled 
with 10 mM EdU (5’-ethynyl-2'-deoxyuridine) for a period of 8 h. The assay was per- 
formed using the Click-iT EdU microplate assay (Invitrogen). Images were obtained 
using a Zeiss Axio fluorescent microscope and analysed using the AxioVision 
Image 4.5 software. 

Apoptosis assay. Cells were plated at a density of 1 X 10* cells in 6 replicates in a 
96-well dish. Twelve hours later, the cells were washed with 1X annexin-binding 
buffer anda cocktail of 5 jul annexin V—Alexa Fluor 488 for 100 pg ml! propidium 
iodide (PI) and 2 pig ml! Hoechst 33342 (Invitrogen) was added. Images were cap- 
tured using the Zeiss fluorescent microscope and Axiovision Image 4.5 software. 
Quantification of the percent apoptosis was obtained using a high-throughput immu- 
nofluorescence plate reader (Celigo). 

ROS assay. Cells were plated at a density of 1 X 10* cells in 6 replicates in 96-well 
dish. Twelve hours later, the cells were incubated with a cocktail of 5 UM concentra- 
tion of CellROX Deep Red Reagent (C10422, Invitrogen) and 2 pg ml” * Hoechst 
33342 (Invitrogen) for 45 min at 37 °C. Images were captured using a Zeiss fluo- 
rescent microscope and Axiovision Image 4.5 software. Quantification of the per- 
cent ROS was obtained using a high-throughput immunofluorescence plate reader 
(Celigo)”*. 

In vitro adeno-Cre infection. ANp63! uM sp53 and A Np73!" i "p53! ~ MEFs were 
plated at a density of 2.5 X 10° cells in 10-cm dishes before infection. Twelve hours 
later, MEFs were infected with Adeno-CMV-mCherry or Adeno-CMV-Cre-mCherry 
(Gene Transfer Vector Core Facility, University of Iowa). The cells were infected at 
a multiplicity of infection of 6,000 particles per cell. The efficiency of infection was 
quantified by assessing mCherry-positive cells. 

In vivoadeno-virus infection and IVIS Lumina imaging. All mice were anesthetized 
using isoflurane and 2% oxygen and placed on a custom bed. An incision was per- 
formed to expose the sternum. Using a 28.5G U100 Insulin syringe, Adeno-mCherry/ 
Adeno-Cre-mCherry (Gene Transfer Vector Core Facility, University of lowa), Adeno- 
IAPP-mCherry(Vector Labs) or Adeno-shIAPP-U6-mCherry (TRCN0000416196, 
Mission shRNA) (Vector Labs) (sequence CCGGTGTAAATTCTCATGCTAAG 
AACTCGAGTTCTTAGCATGAGAATTTACATTTTTTG) was surgically admin- 
istered by intra-thymic injection (5 X 10’ viral particles per gram of body weight) 
through the 2nd and 3rd sternum. The incision was sealed using wound clips and 
mice were allowed to recover. To determine the efficiency of the in vivo viral delivery 
to the thymic lymphoma, IVIS Lumina Imaging (Perkin Elmer) was performed 48 h 
later. Images were captured using a Mid-600 series bandwidth filter and analysed 
using the Living Image data analysis software. 

shRNA knockdown. shRNA plasmids for Trp63 (Clone ID: V3LMM_508694) 
(sequence TGATCTTCAGCAACATCTC) and Trp73 (Clone ID: V3LMM_438557) 
(sequence TGCAGGTGGAAGACATCCA) were obtained from the MD Anderson 
shRNA core facility (Open Biosystems). 293T cells were plated at a density of 2.5 X 10° 
cells in 10 cm dishes. Three micrograms of shRNA and packaging vectors were 
transfected as described previously‘. Cells were selected using puromycin (3 jig ml) 
for 7 days. 

In vitro and in vivo administration of 2-deoxy-D-glucose. 1 X 10* cells were plated 
in 6 replicate wells in a 96-well dish. Twelve hours later, the human cancer cells 
were treated with 50 mM final concentration of 2-deoxy-D-glucose (2DG) (D8375- 
5G, Sigma) for 1 h. Similarly, 2DG (500 mg per kg of tumour weight) (D8375-5G- 
Sigma) was administered directly into the lymphoma of mice as described earlier”. 
N-acetyl-L-cysteine treatment. 1 X 10* cells were plated in 6 replicate wells in a 
96-well dish. Twelve hours later, cells were treated with N-acetyl-L-cysteine (NAC) 
(2 mM) (A8199, Sigma) final concentration for a period of 1h. 

Amylin and caspase inhibitor treatment. 2 X 10° cells were plated in triplicate in 
a 6-well dish. Twelve hours later, cells were treated with Amylin peptide (5 1M) 
(A5972, Sigma) or with a caspase 1 inhibitor (20 1M) (Z-YVAD-FMK-218746, Cal- 
biochem) for a period of 48h. 

Invitro and in vivo administration of pramlintide acetate. 2 X 10° cells were plated 
in duplicate in a 6-well dish. Twelve hours later, cells were treated with 10 pg ml~ 7 
pramlintide acetate (AMYLIN Pharmaceuticals) or placebo for a period of 48h. 
pramlintide acetate (AMYLIN Pharmaceuticals) or placebo (sodium acetate/acetic 
acid) was surgically administered through non-invasive intra-thymic injection using 
a multiple dose protocol of pramlintide acetate (30 jg per gram of tumour weight). 
One injection per week for three weeks was administered directly into the thymic 
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lymphoma of the animal. Another cohort of mice was treated bi-weekly for 3 weeks 
by intra-venous (IV) tail-vein injection of pramlintide acetate (45 yg per kg body 
weight) or placebo. The investigator was blinded to the treatment administered to 
each mouse. Tumour volumes were monitored weekly by MRI. Health and blood 
glucose levels of the treated animals were monitored weekly. 

In vitro and in vivo administration of calcitonin receptor antagonist. 2 X 10° 
cells were plated in duplicate in a 6-well dish. Twelve hours later, cells were treated 
with Calcitonin receptor antagonist (1 nM) (AC187, Tocris Bioscience) for a period 
of 48 h with or without simultaneous pramlintide treatment. Similarly, a chronic 
dose of calcitonin receptor antagonist (1 nmol per gram of tumour weight) was 
administered through non-invasive intra-thymic injections with one injection every 
week for a period of three weeks with or without simultaneous pramlintide treat- 
ment. Tumour volume was monitored and measured weekly by MRI. 

Survival analysis. Survival analysis was conducted for the IAPP, RAMP3 and CalCR 
gene in the following data sets: the Memorial Sloan Kettering Cancer Center and the 
TCGA Cancer cohort. We considered four major cancer types with high p53 muta- 
tion rates, which include lung squamous cell carcinoma*', head and neck squamous 
cell cancer***, basal breast cancer**”*, and colon cancer*’. The co-expression of the 
three genes was analysed in cases only with p53 mutation. In all cases, we considered 
gene expression changes above or below two standard deviations with respect to the 
normal controls. The log-rank test and Cox P test was used to assess significance 
between the samples with or without expression changes of the IAPP, RAMP3 and 
CalCR gene using the cBioPortal for cancer genomics”. 

Statistics. Sample size for mouse cohorts in each experiment was chosen based on 
the penetrance of the thymic lymphoma phenotype of the p53 ’~ mouse model 
(80%). Twenty to thirty mice were used for survival analyses. Data were analysed 
using a one-way ANOVA test or a Student's t-test (two-sided) was used for com- 
parison between two groups of data. A P value of 0.05 was considered significant. 
Data are represented as mean + s.e.m. 


31. Liu, P., Jenkins, N. A. & Copeland, N. G. A highly efficient recombineering-based 
method for generating conditional knockout mutations. Genome Res. 13, 
476-484 (2003). 


32. 


33. 


34. 


35. 


36. 


37. 


LETTER 


Lewandoski, M., Wassarman, K. M. & Martin, G. R. Zp3-cre, a transgenic mouse line 
for the activation or inactivation of /oxP-flanked target genes specifically in the 
female germ line. Curr. Biol. 7, 148-151 (1997). 

Jackson, J. G. et al. p53-mediated senescence impairs the apoptotic response to 
chemotherapy and clinical outcome in breast cancer. Cancer Cel! 21, 793-806 
(2012). 

Su, X. et al. TAp63 prevents premature aging by promoting adult stem cell 
maintenance. Cell Stem Cell 5, 64-75 (2009). 

Lin, Y.L. etal. p63 and p73 transcriptionally regulate genes involved in DNA repair. 
PLoS Genet. 5, e1000680 (2009). 

Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals 
unannotated transcripts and isoform switching during cell differentiation. Nature 
Biotechnol. 28, 511-515 (2010). 

Huang, D. W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis 
of large gene lists using DAVID bioinformatics resources. Nature Protocols 4, 44-57 
(2009). 


. Ardenkjaer-Larsen, J. H. etal. Increase in signal-to-noise ratio of > 10,000 times in 


liquid-state NMR. Proc. Nat! Acad. Sci. USA 100, 10158-10163 (2003). 


. Sandulache, V. C. et a/. Glycolytic inhibition alters anaplastic thyroid carcinoma 


tumor metabolism and improves response to conventional chemotherapy and 
radiation. Mol. Cancer Ther. 11, 1373-1380 (2012). 

addocks, O. D. et al. Serine starvation induces stress and p53-dependent 
metabolic remodelling in cancer cells. Nature 493, 542-546 (2013). 


. The Cancer Genome Atlas Research Network. Comprehensive genomic 


characterization of squamous cell lung cancers. Nature 489, 519-525 (2012). 


. Agrawal, N. et al. Exome sequencing of head and neck squamous cell carcinoma 


reveals inactivating mutations in NOTCH1. Science 333, 1154-1157 (2011). 


. Stransky, N. et a/. The mutational landscape of head and neck squamous cell 


carcinoma. Science 333, 1157-1160 (2011). 

The Cancer Genome Atlas Network. Comprehensive molecular portraits of human 
breast tumours. Nature 490, 61-70 (2012). 

Banerji, S. et al. Sequence analysis of mutations and translocations across breast 
cancer subtypes. Nature 486, 405-409 (2012). 

he Cancer Genome Atlas Network. Comprehensive molecular characterization of 
human colon and rectal cancer. Nature 487, 330-337 (2012). 


a 


. Cerami, E. et al. The cBio Cancer Genomics Portal: an open platform for exploring 


multidimensional cancer genomics data. Cancer Discovery 2, 401-404 (2012). 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


a 
Aflll Hindlll 
WT 
Targeting 
vector 
fn 
fl 
5’ probe > € 3’ probe 
fl-F wt- 
5’ pr be D> 3” probe 
KO-F t-R P 
d 
WT ANp73"- p73" 
AND73 ie B 
TAD73 i a 


B actin mm one 


Extended Data Figure 1 | Generation and characterization of 4Np73 
conditional knockout mice. a, The ANp73 targeting vector was generated by 
inserting /oxP sites (triangles) flanking exon 3’ and a neomycin cassette (neo) 
flanked by frt sites (squares). The location of PCR primers in each allele is 
shown by blue arrows. The targeted region of the floxed allele is depicted by 
yellow-dashed lines. b, Southern blot analysis using the 5’ probe shown ina and 
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tail genomic DNA derived from mice of the indicated genotypes. c, PCR 
analysis using tail genomic DNA of the indicated genotypes. d, Western blot 
analysis using mouse embryo fibroblasts (MEFs) of the indicated genotypes. 
e, f, RT-PCR in MEFs of the indicated genotypes, n = 4, P< 0.005. Statistical 
significance is indicated by black asterisks. 
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Extended Data Figure 2 | Decreased thymic lymphomagenesis and 
increased survival in mice double deficient for 4Np63 and p53 or 4Np73 
and p53. a, Quantification of thymic lymphoma incidence (n = 30 mice). 

b, Table showing thymic lymphoma volumes. The difference in tumour 
volumes between p53 /~ and ANp63*’;p53-’~ and p53-’~ and 

ANp73 ’ ;p53 ‘~ was statistically significant with P values < 0.03 and 

< 0.002, respectively. c, Kaplan-Meier survival in mice. Boxed numbers 
indicate median survival. d, e, Western blot analysis of thymic lymphomas of 
the indicated genotypes. Arrows indicate specific isoforms, and asterisks 
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indicate non-specific bands. f-h, qRT-PCR for PUMA (f), Noxa (g), and bax 
(h) in thymic lymphomas of the indicated genotypes, n = 4, P< 0.005. 

i, Immunohistochemistry (IHC) for cleaved caspase 3 in thymic lymphomas. 
j, Quantification of apoptosis as assessed by cleaved caspase 3 staining, n = 20 
fields of 3 biological replicates, P< 0.005. k-m, qRT-PCR for PML (k), 

p16 (1), and p21 (m) in indicated thymic lymphomas, n = 4, P< 0.005. n, IHC 
for PCNA in indicated thymic lymphomas. 0, Quantification of the percentage 
of proliferation as assessed by PCNA staining, n = 20 fields of 3 biological 
replicates, P< 0.005. Statistical significance indicated by black asterisks. 
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Extended Data Figure 3 | Increased apoptosis and cell cycle arrest in 
ANp63*’ ~;p53-’~ and ANp73’ ~;p53/— thymocytes after genotoxic 


stress. a, Western blot analysis in thymocytes derived from mice 6h after 


treatment with 0 Gy or 10 Gy gamma irradiation. b-f, (RT-PCR for 


TAp63 (b), TAp73 (c), PUMA (d), Noxa (e), and bax (f) from samples shown 
in a, n = 4, P<0.005. qRT-PCR normalized to samples treated with 0 Gy. 


g, Immunohistochemistry (IHC) for cleaved caspase 3 in samples from a. 


h, Quantification of the percentage of apoptosis as assessed by cleaved caspase 
3 staining, n = 20 fields of 3 biological replicates, P< 0.005. i-k, RT-PCR 
for PML (i), p16 (j), and p21 (k) using total RNA from samples shown in 

a, n = 4, P< 0.005. 1, IHC for PCNA in samples shown in a. m, Quantification 
of the percentage of proliferation as assessed by PCNA staining, n = 20 fields 
of 3 biological replicates, P< 0.005. Statistical significance is indicated by 
black asterisks. 
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Extended Data Figure 4 | In vivo intra-thymic delivery of adenovirus-Cre- 
mCherry. a-—c, [VIS Lumina imaging of thymic lymphomas of mice of the 
indicated genotypes infected with adenovirus (Ad)-mCherry (a) or Ad-Cre- 
mCherry (b, c) at 10 weeks of age and 48 h after adenoviral delivery. Red 
fluorescence indicates viral delivery to the thymus shown by the yellow dashed 
ovals. Red fluorescence near the mouth is due to auto-fluorescence of calcium 
and mineral deposits in the teeth. d, Western blot analysis using lysates 

from indicated thymic lymphomas 48 h after infection with adenovirus 
(Ad)-mCherry or Ad-Cre-mCherry. e, f, Quantitative real time (qRT-PCR) of 
thymic lymphomas 48 h after infection with Ad-mCherry (AN“"';p53-) or 
Ad-Cre-mCherry (ANp63““;p53’~ or ANp73“4;p53 ‘~), n = 4, P<0.005. 
g, Immunohistochemistry (IHC) for cleaved caspase 3 in thymic lymphomas 
48 h after infection with Ad-mCherry (AN“;p53-/~) or Ad-Cre-mCherry 
(ANp63“4;p53-/— or ANp73“4;p53 ‘—). h, Quantification of apoptosis as 
assessed by cleaved caspase 3 staining of the indicated thymic lymphomas, 

n = 20 fields of 3 biological replicates, P< 0.005. i, j, RT-PCR of thymic 
lymphomas 48 h after treatment with Ad-mCherry (4N™";p53~/-) or 


Ad-Cre-mCherry (ANp63“4;p53-/~ or ANp73“4;p53/— ), n = 4, P<0.005. 
k, Senescence-associated B-galactosidase (SA-B-gal) staining (blue) of thymic 
lymphomas 48 h after treatment with Ad-mCherry (4N“";p53-‘-) or Ad-Cre- 
mCherry (4Np6344;p53-/~ or ANp73“4;p53 ‘ ). 1-0, Flow cytometry 
plots of the indicated thymocytes at 4-week of age. p, Bar graph showing 
quantification of CD4, CD8, and CD4/CD8 double-positive (DP) cells. n = 3 
mice per genotype, P < 0.005. q-s, Flow cytometry plots of thymic lymphoma 
cells 48 h after adenovirus-mCherry or adenovirus-CRE treatment for the 
indicated genotypes. t, Bar graph showing quantification of CD4, CD8, and 
CD4/CD8 double-positive (DP) cells in the indicated genotypes. n = 3 mice per 
genotype, P < 0.005. u, Cartoon representation of isolation of CD45-postive 
thymic lymphoma cells from 10-week-old mice of indicated genotypes. 

v, Western blot analysis of CD45-postive thymic lymphoma cells after 
treatment with Ad-mCherry (4N"";p53~/-) or Ad-CRE-mCherry 
(ANp63“4;953 /~ and ANp73“4;p53 ‘~). Statistical significance is indicated 
by black asterisks. 
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Extended Data Figure 6 | Metabolic genes including [APP are upregulated 
in thymic lymphomas deficient for ANp63 or ANp73 and p53. a, Supervised 
hierarchical clustering of RNA-sequencing data from thymic lymphomas 

48 h after treatment with Ad-mCherry (AN”";p53~/~) or Ad-Cre-mCherry 
(ANp63A/A;p53 ‘~ or ANp73A/A;p53 ‘~). b, c, (RT-PCR for GLS2 (b) and 
TIGAR (c) in the indicated thymic lymphomas, n = 4, P< 0.005. d, (RT-PCR 
for GLS2 in MEFs of the indicated genotypes expressing shRNAs for a 
non-targeting sequence (shNT), TAp63 (shTAp63) and TAp73 (shTAp73), 
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IAPP promoter and intron 2. f, g, GRT-PCR of promoter site 1 using chromatin 
immunoprecipitation in MEFs of the indicated genotypes, n = 3, P< 0.005. 
h-k, Dual luciferase reporter assay for pGL3-IAPP-promoter site 1 (h, i) anda 
mutant version of this reporter gene (pGL3-I[APP MUT) (j, k). Genotypes of 
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Extended Data Figure 7 | Systemic in vivo delivery of pramlintide results in _ imaging and injection (Inj.) of pramlintide in mice with p53-deficient thymic 


tumour regression in p53-deficient thymic lymphomas. a, Western blot lymphomas. d-q, MRI imaging at 10, 11, 12 and 13 weeks after treatment with 
analysis showing IAPP expression in the indicated thymic lymphomas, n=5 _ placebo (d-g) or pramlintide (i-p); quantification of tumour volumes in 
mice. b, Kaplan-Meier survival indicating thymic lymphoma-free survival. placebo (n = 3) (h) and pramlintide-treated mice (n = 7) (q), P< 0.005. 

n= 8 mice per group, P< 0.005. c, Cartoon indicating schedule of MRI Statistical significance indicated by black asterisk. 
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Extended Data Figure 8 | [APP inhibits glycolysis by increasing 
intracellular G-6-P levels. a, b, Quantification of apoptosis (a) and 
proliferation (b), n = 20 fields of 3 biological replicates, P< 0.005. c, RT-PCR 
for the target genes indicated on the x-axis in the indicated H1299 cells 
expressing the indicated siRNAs, n = 4. Asterisks indicate statistical 
significance (P < 0.005) relative to siNT. d, Western blot analysis of H1299 cells 
treated with the indicated siRNAs. e, f, Bar graph indicating glucose-dependent 
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proton secretion as a measure of glucose uptake and intracellular levels of 
glucose-6-phosphate in H1299 cells with the indicated siRNAs and treatments 
(f). g, Colour-coded legend for panels e, f and i. h, Western blot analysis of 
H1299 cells expressing the indicated siRNAs. i, Immunofluorescence analysis 
for ROS (red) or apoptosis (green or green/red) in H1299 cells expressing the 
indicated siRNAs and treated with 2DG and/or NAC. 
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Extended Data Figure 9 | Treatment of p53-mutant human cancer cell lines 
with pramlintide inhibits glycolysis and induces ROS and apoptosis. 

a, b, Western blot analysis of H1299 cells expressing the indicated siRNAs (a) or 
concentrated media derived from H1299 cells expressing siNT, siANp63, or 
siANp73 (b). c, Extracellular acidification rate (ECAR) using H1299 cells 
expressing the indicated siRNAs and treated with the indicated media 
containing secreted IAPP and treated with the indicated amylin inhibitor (AI). 
d-g, Extracellular acidification rate (ECAR) as a measure of glycolysis in 
SW480 (d), MDA-MB-468 (e), SRB12 (f) and COLO16 (g) human cancer cell 
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lines after treatment with placebo, pramlintide, or pramlintide and a calcitonin 
receptor inhibitor (CalR I.), n = 3, P<0.005. Glucose, oligomycin, and 
2-deoxy-D-glucose (2DG) were supplied to the media at the indicated time 
points shown on the x-axis. h, i, Immunofluorescence for ROS (red) (h) and 
apoptosis (green) (i) on the indicated cells, n = 3. j, k, Kaplan-Meier survival 
curves using data from patients with p53 mutant tumours with the indicated 
cancers and co-expression of IAPP, RAMP3 and CALCR. Boxed numbers 
represent median survival. 
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The mitotic checkpoint complex binds a second 
CDC20 to inhibit active APC/C 


Daisuke Izawa! & Jonathon Pines! 


The spindle assembly checkpoint (SAC) maintains genomic stability 
by delaying chromosome segregation until the last chromosome has 
attached to the mitotic spindle. The SAC prevents the anaphase pro- 
moting complex/cyclosome (APC/C) ubiquitin ligase from recogniz- 
ing cyclin B and securin by catalysing the incorporation of the APC/C 
co-activator, CDC20, into a complex called the mitotic checkpoint 
complex (MCC). The SAC works through unattached kinetochores 
generating a diffusible ‘wait anaphase’ signal’ that inhibits the APC/C 
in the cytoplasm, but the nature of this signal remains a key unsolved 
problem. Moreover, the SAC and the APC/C are highly responsive to 
each other: the APC/C quickly targets cyclin B and securin once all the 
chromosomes attach in metaphase, but is rapidly inhibited should 
kinetochore attachment be perturbed**. How this is achieved is also 
unknown. Here, we show that the MCC can inhibit a second CDC20 
that has already bound and activated the APC/C. We show how the 
MCC inhibits active APC/C and that this is essential for the SAC. 
Moreover, this mechanism can prevent anaphase in the absence of 
kinetochore signalling. Thus, we propose that the diffusible ‘wait ana- 
phase’ signal could be the MCC itself, and explain how reactivating 
the SAC can rapidly inhibit active APC/C. 


IP with 


The MCCis an APC/C inhibitor containing the MAD2, BUBRI and 
BUB3 checkpoint proteins in a complex with CDC20°, where MAD2 
and BUBRI inhibit CDC20 by binding to substrate and APC/C recog- 
nition motifs**. To elucidate how the SAC inhibits the APC/C we pro- 
duced recombinant human MCC (rMCC) by co-expressing His,-tagged 
MAD2, streptavidin binding protein (SBP)-tagged BUBR1 and untagged 
CDC20 at a 8:1:2 ratio (Extended Data Fig. 1a—e) in baculovirus-infected 
Sf9 cells. We co-purified MAD2, BUBR1 and CDC20 in a ‘core MCC’ 
complex at a 1:1:1 ratio (Extended Data Fig. 1b). 

Incubating core rMCC with recombinant His,-tagged CDC20 showed 
that core MCC could bind a second CDC20 molecule (Fig. 1a and Ex- 
tended Data Fig. 1f), which was not due to CDC20 homodimerizing 
(Fig. 1a). Including BUB3 in the core rMCC made no difference to the 
amount of CDC20 that was bound (Extended Data Fig. 2). We note here 
recent speculation that the MCC may contain two molecules of CDC20’. 
The mode of binding to the second CDC20 differed from that required 
to form the core MCC because core MCC could bind to a CDC20°“"'® 
mutant unable to bind MAD2° (Fig. 1a and Extended Data Fig. 1c). This 
also excluded the possibility that the second CDC20 had exchanged with 
CDC20 in the core MCC. 
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Figure 1 | Core MCC can inhibit APC/C°°™®. a, Second CDC20 binding 
assay. *"*-S®’CDC20 or rMCC, composed of untagged CDC20, ***BUBR1 
and °"SMAD2 were incubated with streptavidin (strep.) beads, unbound 
proteins washed away, and the beads incubated with either wild-type (WT) 
or AKILR (K'"ILR/AAAA) mutant “"*CDC20 (Extended Data Fig. 1f). 
Proteins retained on the streptavidin beads were analysed by quantitative 
immunoblotting. Molecular mass markers are on the left; kDa, kilodalton. 

b, c, MCC prefers to bind APC/C°?™®. The APC/C was immunoprecipitated 
from CDC20-depleted mitotic extracts supplemented with a constant amount 
of core MCC, and increasing amounts of S8PCDC20 (b), or vice versa (c), 
and analysed as in a. IP, immunoprecipitate. d, The MCC is an APC/ Coe 
inhibitor. The APC/C was immunoprecipitated as in b and incubated with 
infrared-dye-conjugated securin in an ubiquitylation reaction at 37 °C for 15 or 


30 min with core rMCC and/or $??CDC20 (1.5:1 ratio of core rMCC to 
rCDC20, see Extended Data Fig. 3a, b). Securin ubiquitylation (securin—-ubi,,) 
was analysed by SDS-PAGE and a Li-COR Odyssey scanner. The amount of 
unconjugated securin is shown below the panel (level at 0 min is set to 1.0). 
e-g, The MCC inhibits active APC/C. e, The APC/ CACPO0 Was pre-incubated 
with °®°CDC20 to form APC/C°?™®, unbound ??CDC20 washed away, 
and APC/C activity assayed as in panel d for 30 min. A 10-fold excess of 
rMCC to immunoprecipitated APC/C was added at 0 min (see also Extended 
Data Fig. 3c). f, APC/C activity was assayed as in e except that rMCC was 
added 5 min after starting the reaction. g, Unconjugated securin was measured 
from three independent experiments and the mean and s.d. plotted against 
time. To estimate APC/C inhibition, the level of securin at 5 min was set to 1.0. 
All results in Fig. 1 are representative of three or more experiments. 


1The Gurdon Institute and Department of Zoology, Tennis Court Road, Cambridge CB2 1QN, UK. 
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The question arose as to why we could not purify rMCC with two 
molecules of CDC20. We postulated that the second CDC20 bound less 
stably than the first CDC20, which is cooperatively bound by MAD2 
and BUBRI‘; therefore, limited amounts of CDC20 would preferentially 
incorporate into the core MCC. In agreement with this, we purified some 
core rMCC bound to a second CDC20 from Sf9 cell lysates containing 
excess CDC20 (50% bound in Extended Data Fig. 1g). We noted that 
increasing the amount of functional °°*CDC20 enhanced core rMCC 
binding to the APC/C (Fig. 1b and Extended Data Fig. 1h, i). This indi- 
cated that core MCC could bind CDC20 associated with the APC/C, 
and that core rMCC did not compete with °°’CDC20 for APC/C bind- 
ing (Fig. 1c). This agreed with our previous finding that the MCC and 
CDC20 bind to the APC/C through different sites”. 

To determine the properties of MCC as an APC/C°?” inhibitor 
we used a reconstituted ubiquitylation assay with APC/C isolated from 
CDC20-depleted mitotic cells (APC/CA@P™), and incubated it with 
°®PCDC20 and/or core rMCC. Adding CDC20 strongly activated the 
APC/C, whereas, as expected**, core MCC alone only weakly stimulated 
the APC/C (Fig. 1d). Neither MAD2 nor BUBR1 alone can inhibit the 
mitotic APC/C”, and together they require pre-incubation to inhibit 
interphase APC/C°?™ (ref. 7). By contrast, core MCC was a potent 
and rapid inhibitor of active APC/C°?*: as well as preventing CDC20 
from activating the APC/C (Fig. 1d and Extended Data Fig. 3a, b), it 
inhibited active mitotic APC/C within 10 min (Fig. le-g and Extended 
Data Fig. 3c). 

To gain insight into how core MCC could inhibit active APC/C°??, 
we sought to identify how core MCC bound toa second CDC20. Studies 
on yeast MAD3/BUBRI had implicated a number of D-box and KEN 
box motifs in binding to CDC20, and as important for the SAC’**. A 
D-box bound to the side of the CDC20 B-propeller domain in the MCC 
structure, whereas a KEN box bound to the top face®. We hypothesized 
that the second CDC20 might bind to the core MCC ina similar manner; 
therefore, we introduced mutations into the D-box receptor (D177A; 
ADR) and the KEN-box receptor (N329A/N331A/T377A/R445A; AKR) 
of CDC20. Both these CDC20 mutants bound much less well to core 
rMCC in vitro (Fig. 2a, b). Since the ADR mutant could still be incor- 
porated into the core MCC (Fig. 2c), we tested whether inhibiting a 
second CDC20 was important for the SAC (Fig. 2d). We replaced endog- 
enous CDC20 with the ADR mutant, or the AKR mutant as a positive 
control, and assayed the ability of cells to arrest in response to noco- 
dazole. As expected, the AKR mutant abrogated the SAC because it 
could not form the core MCC (Fig. 2c-e). By contrast, the ADR mutant 
assembled into the core MCC and bound to the APC/C (Fig. 2c, d), yet 
the SAC was still defective (Fig. 2e). Cells expressing the ADR mutant, 
however, took more time to exit mitosis than those expressing the AKR 
mutant (Fig. 2e). We thought this might be because the ADR mutant 
was less effective at activating the APC/ C"; consistent with this, cyclin 
B1 was degraded more slowly in these cells (Extended Data Fig. 4). These 
data supported the idea that the MCC inhibited a second CDC20 as 
part of a functional SAC. 

Since CDC20 required its D-box and KEN box receptors to bind the 
core MCC, we identified the D-box and a KEN box on BUBRI respon- 
sible for binding CDC20. The structure of the core MCC implicated a 
putative D-box®, but BUBR1 has two KEN-boxes: the first (K26EN) is 
essential to form the core MCC (Extended Data Fig. 1d), whereas the 
second (K304EN) is not required to form the core MCC but is still impor- 
tant for the SAC’**>!°. We thought the second KEN boxa more likely 
candidate to bind a second CDC20”; therefore, we mutated the putative 
D-box (R™*“xxL: AD-box) and the second KEN-box (AKEN2) in human 
BUBRI. Both mutants were incorporated into the core MCC in vitro 
(Fig. 3a and Extended Data Fig. 5a) and in vivo (Fig. 3c); both inhibited 
the CDC20 within the core MCC (Extended Data Fig. 5b), but reduced 
binding to a second molecule of CDC20 (Fig. 3a, b). (Note that BUBR1 
alone did not bind two molecules of CDC20 because neither the D-box 
nor the second KEN-box was required to bind CDC20 in the absence 
of MAD2 (Extended Data Fig. 5c).) Furthermore, replacing endogenous 
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Figure 2 | The MCC binds to CDC20 through substrate recognition 
domains. a, b, Mutating CDC20 substrate recognition domains reduces 
binding to rMCC. a, rMCC was incubated with in vitro translated (IVT) 
3XFlag-tagged wild-type CDC20, or CDC204P", or CDC20“** mutants 
(indicated by *) and analysed as in Fig. la. ADR, D177 mutated to alanine; 
AKR, N329, N331, T377 and R445 mutated to alanine. b, Quantification of 
the data in panel a showing mean + s.e.m. of three independent biological 
replicates. c-e, Defective SAC in cells expressing a CDC20 mutant that weakly 
binds the MCC. c, HeLa cells expressing siRNA-resistant 3 x Flag-tagged 
CDC20 wild-type, or AKR, or ADR mutants, were treated with siRNA against 
CDC20, synchronized at prometaphase with nocodazole (Noc) and collected 
by mitotic shake-off. Anti-flag immunoprecipitates were analysed by 
quantitative immunoblotting. Results are representative of three biological 
replicates. Control cells (Ctrl) were depleted of GAPDH. End., endogenous. 
d, Schematic summary of CDC204P8 and CDC204** mutants. The 
CDC204?* mutant can form the MCC, but is only weakly bound and inhibited 
by the MCC. The CDC2045® mutant cannot form the MCC. e, HeLa cell 
lines of 3Flag~-CDC20 were treated as in c, the time from nuclear envelope 
breakdown (NEBD) to mitotic exit measured, and plotted as a box and whisker 
chart where one diamond represents one cell. Red diamonds indicate the 

cell remained in mitosis until the end of the experiment. n, number of cells 
analysed in three independent experiments. 


BUBRI with the AD-box mutant (Fig. 3c) prevented cells from arrest- 
ing in mitosis in response to either nocodazole (Fig. 3d), or Taxol where 
the SAC is much weaker’ (Extended Data Fig. 5d). Thus, the core MCC 
must inhibit a second CDC20 molecule to impose a functional SAC. 
An important test of our idea that the core MCC inhibited active 
APC/C°? 9 was whether the core MCC could arrest a mitotic cell in 
which kinetochores could not catalyse further CDC20 incorporation into 
the core MCC (see Extended Data Fig. 6a). To prevent the core MCC 
from disassembling we attached a yellow fluorescent protein (YFP) tag to 
MAD2 (Venus-MAD2) anda green fluorescent protein (GFP)-binding 
domain (GBP)'* to CDC20 (GBP—CDC20). We called this stable com- 
plex MCC? (see Extended Data Figs 6 and 7); a similar approach using 
leucine zippers had been used previously in budding yeast’’. We expressed 
MCC” in cells with normal levels of endogenous CDC20. MCC’? was 
able to inhibit the APC/C when the SAC was inactivated in three dif- 
ferent ways: (1) Mcc¥? imposed a metaphase delay (Fig. 4a), in which 
the kinetochores did not stain for MAD2 (Extended Data Fig. 8a). The 
extent of the delay correlated with the amount of GBP-CDC20, and thus 
the amount of MCC™ (Extended Data Fig. 8b); (2) mMccY imposed a 
delay in cells treated with the Mps1 kinase inhibitor reversine to prevent 
core MCC assembly” (Fig. 4b); (3) MCC” arrested cells in mitosis after 
depleting the KNL1 (also known as CASC5) kinetochore protein that 
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Figure 3 | MCC binds to the second CDC20 through the D-box of BUBR1 
and this is required for the SAC. a, b, The D-box and second KEN box of 
BUBR1 bind to CDC20. a, rMCC containing °°’ BUBR1 wild-type, or AD-box, 
or AKEN2 (indicated by *), was incubated with 3 x Flag-tagged CDC20 (IVT) 
and analysed as in a. AD-box, R224A, L226A; AKEN2, K304EN mutated to 
AAA. b, Quantification of the data in panel a to show the mean + s.e.m. of 
four independent biological replicates. c, The D-box mutant of BUBR1 forms 
the MCC. HeLa cells expressing siRNA-resistant 3 x Flag—Cerulean-BUBR1 
(3F-Ce-BUBR1), either wild-type or the AD-box mutant, were treated with 
siRNA against BUBRI, and prometaphase cells collected by mitotic shake off 
and analysed as in Fig. 2c. Result is representative of three biological replicates. 
d, The D-box of BUBR1 is required for the SAC. HeLa cell lines expressing 
wild-type and AD-box mutant 3F-Ce-BubR1 were treated as in ¢, 0.33 LM 
nocodazole was added and the time from NEBD to mitotic exit was measured as 
in Fig. 2e. n, number of analysed cells from two independent biological 
replicates. Note that the AD-box mutation did not affect the recruitment of 
BUBRI to unattached kinetochores (Extended Data Fig. 5e). 


is required for the SAC” (Extended Data Fig. 8c-e). These data supported 
our idea that the core MCC inhibited active APC/C-?™®. Moreover, as 
the MCC inhibits the APC/C without further signalling from the kine- 
tochores, it has one of the essential properties required of the diffusible 
‘wait anaphase’ inhibitor, although our data do not prove that it is the 
diffusible inhibitor in vivo. 

All the functional components of the core MCC were required for 
MCC™” to inhibit APC/C°? because we could not delay cells in mito- 
sis when we stabilized the binding between MAD2 and CDC20 in the 
absence of BUBRI (Fig. 4c), nor when we stabilized MAD2 with a CDC20- 
AKILR mutant that cannot form the core MCC (Extended Data Fig. 9a). 
Finally, we stabilized the binding between MAD2 and CDC20 (MCC’”), 
but replaced BUBR1 with the AD-box mutant to perturb binding to a 
second CDC20. These complexes were much less effective at inhibiting 
APC/CO?™ in vitro (Extended Data Fig. 9b), and unable to delay cells 
in mitosis (Fig. 4c; model in Extended Data Fig. 9c). Thus, we conclude 
that to arrest cells in mitosis the core MCC inhibits a second molecule 
of CDC20 that can even be part of an active APC/C°?®, 

Crucial gaps have remained in our understanding of the SAC: notably, 
how the ‘wait anaphase’ signal generated at unattached kinetochores 
inhibits APC/C activity in the rest of the cell’. Unattached kinetochores 
appear to catalyse a conformational change in MAD2 to bind CDC20” 
and subsequently promote APC/C-MCC formation in the cytoplasm. 
However, it is unlikely that all CDC20 could be bound by MAD2 at the 
kinetochore, therefore additional mechanisms have been proposed to 
prevent the activation of the APC/C, including cytoplasmic amplification 
of MAD2-CDC20 binding”, although this now appears unlikely”, 
and phosphorylation of CDC20 by BUB1™. We now show how the MCC, 
formed either at kinetochores or in the cytoplasm, could act as a diffusible 
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Figure 4 | A stabilized MCC delays anaphase by inhibiting endogenous 
APC/C°?@?, a, HeLa cells were transfected with plasmids encoding 
mCherry-GBP-CDC20 and either Venus-MAD2 or Venus-BUBRI, and the 
time from NEBD to anaphase was analysed in unperturbed mitoses as in Fig. 2e. 
n, number of cells from three independent biological replicates. b, HeLa cell 
lines stably expressing mCherry-GBP-CDC20 and a tetracyclin-inducible 

3 Flag-Venus-MAD2 were treated, or not, with 1 1M reversine (+ Rev) 

6h after release from a thymidine block in the presence (+Tet) or absence 
(—Tet) of tetracyclin, and analysed as in a. n, number of cells from two 
independent biological replicates. c, HeLa cells lines in Fig. 3c expressing 
3F-Ce-BUBR1 plus mCherry-GBP-CDC20 and Venus-MAD2 were treated 
with siRNA against BUBR1 and analysed as in panel a. n, number of cells from 
two independent biological replicates. 


inhibitor to inhibit APC/C°°™? throughout the cell (Extended Data 
Fig. 10), although our data do not prove that it disseminates the ‘wait 
anaphase’ signal in vivo. Previously, it has been proposed’ that the com- 
plex between MAD2 and CDC20 will template the formation of the BBC 
(BUBR1-BUB3-CDC20) complex” to inhibit CDC20—although in these 
experiments p31©°" was depleted, which would alter the levels and 
behaviour of checkpoint complexes**’’. While we also find that the 
BBC is an abundant APC/C inhibitor in cells?””*, we show here that sta- 
bilizing the MCC generates a more potent inhibitor than stabilizing the 
BBC (Fig. 4a; MCC*" see Extended Data Fig. 6b), which agrees with the 
observation that cells containing a greater proportion of MCC over BBC 
exhibit stronger SAC activity'””*. Our results could also solve a further 
conundrum posed by the SAC. MAD2 and the APC/C bind to the same 
KILR motif on CDC20°; therefore, CDC20 must dissociate from the 
APC/C to bind MAD2. By analogy with measurements on Cdh1”, 
CDC20 is predicted to dissociate slowly from the APC/C (half time of 
dissociation ~25 min), yet reactivating the SAC can inhibit active APC/C 
in less than 5 min**. Our finding that MCC rapidly inhibits CDC20 
already bound to the APC/C can help to explain the close temporal cou- 
pling between the SAC and the APC/C. Indeed, our data indicate that 
the MCC prefers CDC20 that is already bound to the APC/C; the reason 
for this will be important to determine in the future. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Cell culture and synchronization. HeLa cells were maintained in Advanced D- MEM 
with 2% FBS. For synchronisation at the beginning of S phase, HeLa cells were 
treated with 2.5 mM thymidine as previously described*. For prometaphase, cells 
were released from a thymidine block and 6h later treated with nocodazole at a 
final concentration of 0.33 |tM for 6-12 h. For SAC-inactivated samples, cells were 
released from a nocodazole block into medium including 1 .M reversine and 10 1M 
MG132 for a further 1h. 

Transfection with siRNA and DNA. The following ON-TARGETplus (Dharmacon, 
CO, USA) oligonucleotides as previously described* were used: CDC20 50 nM 
(CGGAAGACCUGCCGUUACAUU); MAD2 20 nM (GGAAGAGUCGGGACC 
ACAGUU); BUBRI 50 nM (GAUGGUGAAUUGUGGAAUA); KNL1 50 nM (AA 
GAUCUGAUUAAGGAUCCACGAAA) and GAPDH (D-001830-01). Cells were 
transfected with short interfering RNA (siRNA) oligonucleotides once or twice 
at the indicated concentrations using lipofectamine RNAiMax (Invitrogen). To 
transfect siRNA oligonucleotides and DNA plasmids at the same time, cells were 
treated with lipofectamine 2000 (Invitrogen). An siRNA-resistant of resistant open 
reading frame (ORF) of BUBR1 is generated by mutating underlined nucleotides 
(GATGGCGAGCTTUGGAAUA). 

In vitro reconstituted ubiquitylation assay. In vitro ubiquitylation assays were 
performed as described previously* but with modifications to use a fluorescently- 
labelled substrate developed by T. Matsusaka. In brief, CDC20 was depleted by siRNA 
treatment for 48 h before the APC/C was purified with anti-APC3 (AF3.1) anti- 
body from mitotic HeLa cell extract. Immunoprecipitates were resuspended in 
ubiquitylation reaction buffer contained E1-ligase, UbcH10 (E2), ubiquitin, ATP, 
ATP regenerating system, and fluorescently-labelled securin as a substrate in QA 
buffer (100 mM NaCl, 30 mM Hepes pH 7.8, 2 mM ATP, 2 mM MgCl, 0.1 pg pl” t 
BSA, 1 mM DTT) at 37 °C for the indicated time, and supplied with recombinant 
CDC20 and/or core rMCC as indicated. Recombinant securin protein was labelled 
with IRDye680 dye (IRDye 680LT Maleimide Infrared Dye: LiCOR) according to 
the manufacturer’s instructions and directly scanned with a Li-COR Odyssey CCD 
scanner after SDS-PAGE analysis. Ubiquitylation of CDC20, MAD2 and BUBRI 
were analysed by quantitative immunoblotting. After blotting with primary anti- 
bodies, blots were incubated with fluorescently labelled secondary antibodies and 
the fluorescence measured using a LI-COR Odyssey CCD scanner according to the 
manufacturer’s instructions (LI-COR Biosciences, NE, USA). 

Expression of mCherry-GBP-CDC20, Venus-BUBRI1 and Venus-MAD2. We 
used two types of human expression vectors: pcDNA5-3Flag-Venus (inducible CMV 
promoter) and pmCherry-CAG-Cl (chicken B-actin promoter). In the pcDNA5- 
3Flag-Venus, 3Flag—Venus is inserted into the multiple-cloning site of pcDNA5/ 
FRT/TO (Invitrogen). In the pmCherry-CAG-C] vector, EYFP and CMV promoter 
of pEYFP-C1 (Clontech) were replaced by mCherry and CAG promoter, respec- 
tively. A siRNA-resistant open reading frame (ORF) of CDC20 and GBP-CDC20 
were cloned into the pmCherry-CAG-Cl vector, and MAD2 and BUBR1 were cloned 
into pcDNA5-3Flag-Venus. All constructs were verified by sequencing and sequences 
are available on request. To co-express mCherry-GBP-CDC20 and Venus-MAD2 
or Venus-BUBRI, the indicated plasmids were co-transfected with the indicated 
siRNA oligonucleotides using Lipolipofectamine 2000 (Invitrogen). 

Inducible cell lines. To generate cell lines expressing 3Flag~-CDC20 or 3Flag- 
Cerulean-BUBRI proteins from an inducible promoter, a siRNA-resistant ORF of 
CDC20 or BUBRI was cloned into a modified version of pcDNA5/FRT/TO (Invi- 
trogen). Those plasmids were transfected into a HeLa-FRT cell line (a gift from 
S. Taylor) and stable cell lines were generated using the FLIP-in system (Invitrogen). 
To obtain a cell line expressing 3Flag~Venus-MAD2 from an inducible CMV 
promoter and mCherry-GBP-CDC20 from a constitutive CMV promoter (used 
in Fig. 4b), a HeLa-FRT cell line expressing an inducible 3Flag- Venus-MAD2 was 
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transfected with the pmCherry-C1-GBP-CDC20 plasmid and selected with Geneticin 
(Invitrogen). To induce proteins from the inducible promoter, cells were treated 
with tetracycline (1 tgml~', Calbiochem) 36h before analysis. 
Immunoprecipitation and size exclusion chromatography. Cells for immuno- 
precipitation were lysed with HEPES buffer (150 mM KCI, 20 mM Hepes pH 7.8, 
10 mM EDTA, 10% glycerol, 0.2% NP-40, 1 mM dithiothreitol (DTT), Roche com- 
plete inhibitor cocktail tablet, 0.2 1M microcystin, 1 mM PMSF) for 10 min on ice 
and clarified by a 20,000g spin for 10 min. Protein complexes were immunopre- 
cipitated with antibodies (anti- APC4, anti- APC3 (AF3.1), anti-GFP or anti-flag M2 
epitope) covalently coupled to Protein G Dynabeads (Invitrogen) using HEPES buffer 
for incubation and washing. For size exclusion chromatography analysis, cells were 
pelleted then resuspended in buffer A (140 mM NaCl, 20 mM Hepes pH 7.8, 6 mM 
MgCl, 5% glycerol, 1 mM DTT, Roche complete inhibitor cocktail tablet, 0.2 1M 
microcystin, 1 mM PMSF) ata 1:1 ratio of buffer to cells, and lysed by nitrogen cavi- 
tation (1,000 p.s.i., 30 min, Parr Instruments, USA). Lysed cells were centrifuged at 
20,000g for 10 min and 259,000g for 10 min before loading onto a Superose 6 PC 
3.2/30 column (GE Healthcare). The column was run at a flow rate of 25 ul min” 
in buffer B (140 mM NaCl, 30 mM Hepes pH 7.8, 5% glycerol, 1 mM DTT) and 50-pl 
fractions collected. 

Epifluorescence. Cells were seeded into 8-well dishes (Thistle Scientific, UK) to 
enable experiments to be performed in parallel. Before imaging, the culture medium 
was replaced with Leibovitz’s L-15 medium (Gibco Life Technologies, UK) supple- 
mented with 10% fetal bovine serum and penicillin/streptomycin. Cells were imaged 
ona DeltaVision micrososcope equipped with an environmental chamber at 37 °C 
(API, USA) with a QuantEM camera (Photometrics, USA) and Lambda LS illu- 
mination (Sutter, USA) as previously described", or a spinning disc microscope 
(Intelligent Imaging Innovations, Colorado, USA) equipped with a CSU-X1 head 
(Yokogawa, Japan) and a QuantEM:512sc EMCCD camera (Photometrics, USA). 
In Figs 2e, 3d and 4, images of DIC and fluorescence were captured at 6-min intervals 
and the fluorescence intensities were measured and analysed using ImageJ/Fiji soft- 
ware as previously described". 

Antibodies. The following antibodies were used at the indicated dilutions. CDC20 
(sc-13162, Santa Cruz Biotechnology) 1:500; CDC20 (A301-180A, Bethyl labora- 
tories) 1:500; BUBR1 (612503, BD transduction laboratories); BUBR1 (A300-386A, 
Bethy] laboratories) 1:500; MAD2 (610679, BD transduction laboratories) 1:500; 
MAD2 (A300-301A, Bethyl Laboratories) 1:500; BUB3 (611730, BD Transduction 
Laboratories) 1:500; APC3 (610455, BD Transduction Laboratories) 1:500; APC4 
(monoclonal antibody raised against a carboxy-terminal peptide) 1:500; KNLI (a 
gift from M. Yanagida and T. Kiyomitsu) 1:50; anti-myc-epitope (9E10, Santa Cruz 
Biotechnology) 1:500; anti-flag epitope (M2, Sigma) 1:5,000; anti-GFP (Clone 3.1 
and 7.1, Roche) 1:200. 

Secondary antibodies: IRDye 680CW donkey anti-mouse (926-68072, LI-COR), 

IRDye 800CW donkey anti-mouse (926-32212, LI-COR); IRDye 680CW donkey 
anti-rabbit (926-32223, LI-COR); IRDye 800CW donkey anti-rabbit (926-32213, 
LI-COR) were all used at 1:10,000. 
Immunofluorescence. Cells were fixed in 4% paraformaldehyde and 2% sucrose 
for 5 min. After fixation cells were blocked in 3% BSA-PBS NP-40 0.2% and then 
incubated with antibodies. All antibodies were diluted in 3% BSA-PBS NP-40 0.2% 
and washes were performed with PBS NP-40 0.2%. Antibodies were used at the fol- 
lowing dilutions: anti-flag M2 (sigma), 1:4000; anti-GFP (Roche) and anti-CDC20 
(sc-13162, Santa Cruz Biotechnology), 1:400; anti- MAD2 (A300-301A, Bethyl Labo- 
ratories) and anti-BUBR1 (A300-301A, Bethyl Laboratories), 1:200. Anti-ACA serum 
(a gift from W. Earnshaw) was used at 1:20,000. Secondary antibodies conjugated 
to Alexa Fluor 488, Alexa Fluor 568 or Alexa Fluor 647 (Molecular Probes) were 
diluted 1:400. DNA was stained with Hoechst-33342. 
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Extended Data Figure 1 | Recombinant human mitotic checkpoint complex 
binds to a second CDC20. a, Schematic illustration of purification steps. 
Human wild-type CDC20 (untagged), °°"BUBR1 and “*MAD2 were 
expressed in baculovirus-infected Sf9 cells. The recombinant core mitotic 
checkpoint complex (rMCC) was purified by nitrilotriacetic acid (Ni-NTA) 
and streptavidin beads. Purified core rMCC bound to streptavidin beads was 
used to assay binding to purified recombinant Cdc20. b, Core rMCC consisting 
of CDC20, °*"BUBRI and °"*MAD2 was analysed by SDS-PAGE and 
Coomassie blue R250 staining, followed by quantification at 680 nm on a 
LiCOR Odyssey scanner. Equal molar amounts of purified °°*BUBR1, 
SBPCDC20 and S??MAD2? proteins were used to calibrate the Coomassie blue 
staining. The stoichiometry of core rMCC (mean + s.d. is shown below the 
panel with S?’BUBRI set to 1.0) was estimated from three independently 
purified core rMCC preparations. Molecular mass markers are on the left. 

c, d, Both the MAD2 binding motif of CDC20 and the first KEN box of BUBR1 
are required to assemble rMCC. Core rMCC was pulled down with streptavidin 
beads from Sf9 cells expressing S**BUBRI, °“"“MAD2 and either wild-type 
(WT) CDC20 or the K129ILR/AAAA mutant (AKILR) (c), or *"SMAD2, 
wild-type CDC20 plus wild-type ®"BUBRI1, or alanine substitution mutants of 
either KEN box 1 (AKEN1) or KEN box 2 (AKEN2). The proteins retained on 
streptavidin beads were analysed by immunoblotting with the indicated 
antibodies. e, Relative expression levels of core rMCC components. Sf9 cells 
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extracts expressing the core rMCC, and the purified rMCC complex, were 
analysed by quantitative immunoblotting. The ratio of the proteins in the 
extracts is given, with that of SBPBUBRI set to 1.0. f, Schematic illustration of the 
second CDC20 binding assay in Fig. la. In lanes 1 and 2, the streptavidin 
beads were incubated with either °*CDC20 wild-type or the AKILR 
(K129ILR/AAAA) mutant. In lanes 3 and 4, the streptavidin beads bound to 
core rMCC were incubated with the *"*cDC20 proteins. In lanes 6 and 7, 
the streptavidin beads bound to °"*S®’CDC20 were incubated with the 
°**'SCDC20 proteins. g, Sf9 cell extracts expressing core rMCC or 3Flag-tagged 
CDC20 were mixed and the core rMCC purified as in a. The core rMCC 

was analysed by quantitative immunoblotting. 51% of the core rMCC was 
purified bound to a second **"&CDC20. h, A functional CDC20 promotes the 
binding of core rMCC to the APC/C. The APC/C was immunoprecipitated 
from CDC20-depleted mitotic extracts supplemented with a constant amount 
of core rMCC and tenfold excess of recombinant wild-type °°°CDC20, or 
the AKILR or AIR mutants. The co-immunoprecipitates were analysed as in 
Fig. 1c. i, Schematic of the APC/C-MCC-CDC20 ternary complex. Both core 
rMCC and CDC20 bind to the APC/C and form a ternary complex (left). 
The CDC204*"8 mutant cannot bind the APC/C directly, nor stimulate core 
rMCC binding to the APC/C, but CDC204%"8 still binds to rMCC (right). 
All results are representative of two or more independent biological replicates. 
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Extended Data Figure 2 | Comparison of rMCC with and without BUB3. 


a, b, Preparation of recombinant core MCC with or without BUB3. Insect cells 
were infected with viruses expressing core MCC components with and without 


BUB3, and the rMCC was purified by Ni-NTA and streptavidin beads. The 
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complexes were analysed by Coomassie blue (CB) staining (a) and 


immunoblotting (b). c, Binding to a second *"*CDC20 of recombinant core 
MCC with or without BUB3 was performed and analysed as in Fig. 1a. All 


results are representative of two independent biological replicates. 
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Extended Data Figure 3 | Molar ratios of rMCC, CDC20 and the APC/C in 
the in vitro ubiquitylation assays. a,b, Core rMCC and CDC20 from Fig. 1d 
were analysed by quantitative immunoblotting. CDC20, MAD2 and BUBR1 
were analysed by quantitative immunoblotting in the input (a) and in the 
reaction (b). The black filled circles are unconjugated SB?CDC20; red filled 
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circles are ubiquitylated °°°CDC20. c, Core rMCC, rCDC20, and the APC/C 
immunoprecipitates used in Fig. le, plus a purified °° APC3 subunit, were 
analysed by quantitative immunoblotting with the indicated antibodies. The 
calculated molar ratios of rMCC, rCDC20 and the APC/C are shown below 
the panels. 
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Extended Data Figure 4 | Cells expressing the D-box and KEN box receptor mutants, in the presence of nocodazole (0.33 1M). The fluorescence of 
mutants of CDC20 can degrade cyclin B1 in nocodazole. Cyclin B1-Venus _ individual cells was measured, the value at NEBD set to 1 and the mean = s.e.m. 
degradation was analysed in siRNA CDC20-treated cells rescued with for all cells plotted. n, number of cells analysed from at least two independent 
siRNA-resistant versions of 3 Flag—CDC20, wild-type, or ADR, or AKR experiments. 
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Extended Data Figure 5 | Characterization of the MCC containing D-box or 
KEN box 2 mutants of BUBRI1. a, Core rMCC assembled with °°’BUBR1 
wild type, or AD-box, or AKEN2 mutants, was purified as in Extended Data 
Fig. 1a, b and analysed on a LiCOR Odyssey scanner at 680 nm after 
SDS-PAGE and Coomassie blue R250 staining. b, The core rMCC mutants 
prepared in a were assayed as APC/C inhibitors in an in vitro ubiquitylation 
assay as in Fig. 1d. c, Insect cell extracts expressing CDC20 with **"BUBRI, 
either wild type, or AD-box or AKEN2 mutants, were incubated with 
streptavidin beads. The proteins retained on the streptavidin beads were 
analysed by quantitative immunoblotting. Results in panels a—c are 
representative of two independent biological replicates. d, HeLa cells were 


treated with siRNA against BUBR1 and rescued with 3 x Flag—Cerulean- 
BUBRI, either wild-type or the AD-box mutant, and mitosis analysed in 
0.116 uM Taxol as in Fig. 3d. The time from NEBD to anaphase (or mitotic exit) 
was measured and plotted as a box and whisker chart. n, number of analysed 
cells from two independent biological replicates. e, HeLa cells were treated 
with siRNA against BUBRI and rescued with siRNA resistant 3X Flag- 
Cerulean-BUBR1, either wild type or the AD-box mutant, then analysed by 
immunostaining. Cells were stained with anti-Flag M2 and anti-ACA 
antibodies, and Hoechst 33342, and representative images of prometaphase 
cells from two independent biological replicates are shown. Scale bar, 10 jum. 
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Extended Data Figure 6 | Stabilizing the interaction between MAD2 and 
CDC20. a, Schematic of how a stabilized MCC might block cells in metaphase. 
At prometaphase, when the SAC is ‘ON’, CDC20 is inhibited both by 
incorporation into the MCC and through binding to the MCC. At metaphase 
when the SAC is ‘OFF’, CDC20 is released from the MCC and activates the 
APCIC. We postulate that stabilizing an exogenous MCC to prevent its 
disassembly should also prevent endogenous CDC20 from activating the 
APCIC, which results in an anaphase delay. b, Schematic of a stabilized MCC. 
To stabilize the MCC we took advantage of the binding between yellow 
fluorescent protein (Venus) and GFP-binding domain (GBP), which is a 13kDa 
domain from a camelid antibody that binds strongly and specifically to GFP 
and YFP’*. MAD2 and BUBRI were tagged with Venus and the GBP domain 
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end.CDC20 
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end.MAD2 
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was tagged to CDC20. We refer to the MCC containing a stabilized 
MAD2-CDC20 interaction as MCC™, and that with stabilized BUBRI1- 
CDC20 as MCC®". c, GBP- and Venus-fusion proteins bind stably to each other 
in vivo. HeLa cell lines expressing siRNA-resistant myc-CDC20 or myc-GBP- 
CDC20 were transfected with plasmids encoding either Venus alone or 
Venus-MAD2, followed by siRNA treatment against CDC20. After a single 
thymidine block and release, the cells were arrested at prometaphase by treating 
with nocodazole, and harvested by mitotic shake-off 48 h after the siRNA 
treatment. Proteins were immunoprecipitated with anti-myc epitope 
antibodies before analysis by quantitative immunoblotting with the indicated 
antibodies. WT, myc-CDC20; GBP, myc-GBP-CDC20. Results in panel c are 
representative of three independent biological replicates. 
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Extended Data Figure 7 | Stabilizing the interaction between MAD2 and 
CDC20 prevents disassembly of the MCC in vivo. a, b, Tethering CDC20 to 
MAD? prevents MCC disassembly and release from the APC/C. a, Empty 
plasmids or plasmids encoding Venus-MAD2 were transfected into HeLa cell 
lines expressing 3 Flag~-GBP-CDC20 and the cells synchronized at 
prometaphase by thymidine release followed by a nocodazole block. Cells were 
harvested by mitotic shake off and separated into two cultures after washing 
once in medium. One culture was harvested immediately (—reversine) and the 
other resuspended in medium containing 1 1M reversine and 10 4M MG132 
(+reversine) for 1h before harvesting. The APC/C was immunoprecipitated 
with an anti-APC4 antibody and the immunoprecipitates analysed by 
quantitative immunoblotting. We note that the APC/C preferred to bind 
endogenous CDC20 over GBP-CDC20 as the co-activator in vivo (see 
+reversine lane in control cells) but the MCC” did not sequester endogenous 
CDC20 from the APC/C (see +reversine lane in GBP-CDC20 + Venus-— 
MAD2 cells). b, Mean + s.e.m. of the relative amounts of the indicated proteins 
in the APC4 immunoprecipitates calculated from four independent biological 
experiments. The amount of protein bound to the APC/C in the absence of 
reversine was set to 1 (red line). c-e, Tethering CDC20 to MAD2 prevents MCC 


LETTER 


disassembly and release from the APC/C in the absence of endogenous CDC20. 
c, Plasmids encoding Venus-MAD2 were transfected into HeLa cell lines 
expressing the indicated CDC20 fusion proteins following siRNA treatment 
against CDC20 for 48 h. Cells were synchronized at prometaphase then treated 
with reversine, and anti APC4 and anti-GFP immunoprecipitates were 
analysed as in a. WT, myc-CDC20; GBP, myc-GBP-CDC20. Note that 
endogenous CDC20 could not be inhibited through exchange into MCC”? 
because a core MCC composed of Venus-MAD2 and untagged CDC20 
disassembled. d, HeLa cell lines expressing myc-CDC20 (upper blots) or 
myc-GBP-CDC20 (lower blots) were transfected with a plasmid encoding 
Venus-MAD2 followed by siRNA treatment against CDC20 for 48 h. Cells 
were synchronized at prometaphase and treated with reversine as indicated 
in a. Total cell extracts were analysed by size exclusion chromatography 

on a Sepharose 6 column and fractions were analysed by quantitative 
immunoblotting against the indicated proteins and the relative amounts of 
Venus-MAD2 plotted in panel e with the sum of Venus-MAD2 intensities set 
to 1. The migration of APC/C or APC/C-MCC is annotated below panel d. 
All results are representative of three independent biological replicates. 
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Extended Data Figure 8 | KNL1 (also known as CASC5) is not required for a 
stabilized MCC to inhibit anaphase. a, HeLa cells expressing MCC in 
Fig. 4a were analysed by immunostaining. The cells were stained with anti-GFP, 
anti-MAD2, anti-ACA and Hoechst 33342, and representative images of 
prometaphase and metaphase cells from two independent biological replicates 
are shown. Scale bar, 5 im. b, The time from NEBD to anaphase in Fig. 4a was 
plotted against the intensity of mCherry-GBP-CDC20 (left) or the ratio of 
Venus-MAD2 to mCherry-GBP-CDC20 (right). The ratio of Venus-MAD2 
to mCherry-GBP-CDC20 was calibrated by measuring fluorescence intensity 
of a mCherry-GBP-Venus fusion protein in Hela cells. c-e, MCC” delays 
anaphase when KNL1 is depleted. c, HeLa cells were treated with siRNA against 
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KNLI for 72h and total cell extracts were analysed by quantitative 
immunoblotting with the indicated antibodies. d, HeLa cells treated as in c were 
analysed by immunostaining. The cells were stained with anti-CDC20, anti- 
BUBRI, anti-ACA and Hoechst 33342, and representative images of early 
prometaphase from two independent biological replicates are shown. Scale bar, 
10 um. e, HeLa cell lines stably expressing mCherry-GBP-CDC20 and an 
inducible 3x Flag-Venus-MAD2 (expressed from a tetracyclin-inducible 
promoter) were treated with siRNA against KNL1 as in c. Progression 
through mitosis was analysed in the presence (+Tet) or absence (—Tet) of 
tetracyclin, and analysed as in Fig. 4b. n, number of cells from two independent 
biological replicates. 
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Extended Data Figure 9 | Functional MCC™? is required to delay anaphase. 
a, HeLa cells were transfected with plasmids encoding Venus-MAD2 and 
either wild-type or a MAD2-binding defective (AKILR) mutant of CDC20 
tagged with mCherry-GBP, and mitotic progression was analysed as in Fig. 4a. 
n, number of cells from three independent biological replicates. b, The core 
rMCC mutants used in Extended Data Fig. 5a were incubated with preformed 
APC/C°?™ and assayed as APC/C inhibitors in an in vitro ubiquitylation 
assay as in Fig. le. The extent of APC/C inhibition (incubation of MCC" set 
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to 1.0) is shown below the securin panel. This result is representative of two 
independent experiments. c, Schematic of the inhibitory activities of the 
stabilized MCCs in BUBR1-depleted cells used in Fig. 4c. When BUBR1 is 
depleted, MAD2 and CDC20 cannot form the MCC to inhibit endogenous 
CDC20 (left). When rescued with wild-type BUBR1, Mcc™ can form and 
inhibit endogenous CDC20 to delay anaphase. By contrast, when rescued by the 
BUBR1 AD-box mutant, MCC™ can only weakly inhibit endogenous CDC20 
and cells can proceed into anaphase. 
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Extended Data Figure 10 | Model for how the MCC could disseminate the 
‘wait anaphase’ signal. Unattached kinetochores catalyse MCC formation 
and the MCC disseminates the ‘wait anaphase’ signal through the cytoplasm 
(black arrows). When the MCC disassembles (blue arrows), this releases 
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CDC20, which along with newly synthesized CDC20, can have two fates: to be 
recruited to unattached kinetochores and incorporated into the MCC, or to 
bind the APC/C to form APC/C°?°. The MCC is able to inhibit both 
unbound CDC20 and CDC20 bound to the APC/C (red bars). 
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Uncovering the polymerase-induced cytotoxicity of 


an oxidized nucleotide 


Bret D. Freudenthal’, William A. Beard", Lalith Perera’, David D. Shock’, Taejin Kim*?, Tamar Schlick”? & Samuel H. Wilson! 


Oxidative stress promotes genomic instability and human diseases’. A 
common oxidized nucleoside is 8-oxo-7,8-dihydro-2'-deoxyguanosine, 
which is found both in DNA (8-oxo-G) and as a free nucleotide (8- 
oxo-dGTP)”*. Nucleotide pools are especially vulnerable to oxidative 
damage*. Therefore cells encode an enzyme (MutT/MTH1) that removes 
free oxidized nucleotides. This cleansing function is required for cancer 
cell survival’ and to modulate Escherichia coli antibiotic sensitivity 
ina DNA polymerase (pol)-dependent manner’. How polymerases dis- 
criminate between damaged and non-damaged nucleotides is not well 
understood. This analysis is essential given the role of oxidized nucle- 
otides in mutagenesis, cancer therapeutics, and bacterial antibiotics’. 
Even with cellular sanitizing activities, nucleotide pools contain enough 
8-oxo-dGTP to promote mutagenesis”. This arises from the dual 
coding potential where 8-oxo-dGTP(anti) base pairs with cytosine and 
8-0xo-dGTP(syn) uses its Hoogsteen edge to base pair with adenine”. 
Here we use time-lapse crystallography to follow 8-oxo-dGTP inser- 
tion opposite adenine or cytosine with human pol B, to reveal that 


insertion is accommodated in either the syn- or anti-conformation, 
respectively. For 8-oxo-dGTP (anti) insertion, a novel divalent metal 
relieves repulsive interactions between the adducted guanine base and 
the triphosphate of the oxidized nucleotide. With either templating 
base, hydrogen-bonding interactions between the bases are lost as the 
enzyme reopens after catalysis, leading to a cytotoxic nicked DNA repair 
intermediate. Combining structural snapshots with kinetic and com- 
putational analysis reveals how 8-oxo-dGTP uses charge modulation 
during insertion that can lead to a blocked DNA repair intermediate. 

A primary defence mechanism against oxidative DNA damage is base 
excision repair, which in eukaryotes utilizes pol B’*”*. During times of 
oxidative stress, pol B can perform futile repair by inserting 8-oxo-dGTP 
opposite cytosine (Cy) or adenine (Ad) (Fig. la—c), and it is implicated 
in driving tumorigenesis’*’. Pol B binds to gapped DNA in an open 
conformation; upon binding the incoming nucleotide it undergoes a 
conformational change to form the pre-catalytic closed complex with 
two active site divalent metals ions: the catalytic (Mg.) and nucleotide 
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Figure 1 | 8-Oxo-dGTP specificity and insertion opposite adenine. 

a, 8-Oxo-dGTP base pairing with Cy or Ad. b, Pathways associated with 8-oxo- 
G DNA repair for the A to C transversion. Dashed lines are pol B insertion 
events. c, Pol 8 insertion efficiency of 8-oxo-dGTP and dGTP opposite either 
templating Cy or Ad. Discrimination for the preferred nucleotide is indicated. 
d, 8-Oxo-dGTP(syn):Ad pre-catalytic complex. e, Phosphodiester bond 
formation after a 20s soak in MgCl, with the reactant (green) and product 


dGTP 8-oxo-dGTP 


Helix N 
(open) 


Helix N 
(closed) 


8-oxo-dGMP(anti) 


(yellow) states. f, Closed 8-oxo-dGMP:Ad product complex after a 40s soak in 
MgCl). g, Open 8-oxo-dGMP:Ad product complex after a 90s soak. Closed 
conformation is shown in green (Protein Data Bank accession number 2FMS). 
F, — F, maps (30) are in green. Ca?*, Mg", and Na‘ are orange, red, and 
purple spheres respectively. The catalytic, nucleotide, and product metals are 
denoted with subscripts c, n, and p, respectively. 
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(Mg,) metals'*"*. This complex is optimized for nucleotidyl transfer, 
forming pyrophosphate (PP;), and following catalysis, pol B reopens, 
releasing PP;. 

Soaking open binary crystals of pol B bound to DNA containing a 
templating Ad in a cryosolution with 8-oxo-dGTP and CaCl, results 
in a closed pre-catalytic ground state ternary complex (Extended Data 
Table 1). The incoming 8-oxo-dGTP(sym) Hoogsteen base pairs with Ad 
(Fig. 1d). The active site is in a similar conformation as previously observed 
using a dideoxy-terminated primer (Extended Data Fig. 1a)'°, with a 
change in the primer terminus sugar pucker to C3’-endo (Extended Data 
Table 2). Unique active site interactions include Asn 279 hydrogen bond- 
ing to O8 of 8-oxo-dGTP(syn) and an intramolecular hydrogen bond 
between N2 and the pro-S, oxygen on Pa of 8-oxo-dGTP(syn) (Fig. 1d). 
This is consistent with previous studies identifying a role of Asn 279 in 
stabilizing 8-oxo-dGTP(syn)””. 

To observe catalysis the ground state crystals were transferred to a 
solution containing MgCl, for 20 s (Extended Data Table 1). Density cor- 
responding to both the reactant and product was observed (Fig. le and 
Extended Data Fig. 1b), and on the basis of occupancy refinement the 
reaction was 60% ‘complete’. Compared with the ground state there is 
only moderate movement in the active site, at Pa and O3' (Extended 
Data Fig. 1b, c). The Hoogsteen base pairing and hydrogen bonding inter- 
actions that stabilize the planar syn-conformation are maintained, and 
the enzyme remains in the closed conformation. These observations 
are consistent with 8-oxo-dGTP insertion opposite Ad exhibiting a high 
catalytic efficiency (Fig. 1c). 


Helix N 


Figure 2 | 8-Oxo-dGTP insertion opposite cytosine. a, 8-Oxo-dGTP:Cy 
pre-catalytic active site. b, A 90° rotation relative to a. Coordinating waters 
(blue) with distances (in angstréms) are indicated. Clash at O8 is shown with 
red dashes. The Cy backbone shift is indicated. c, Overlay of 8-oxo-dGTP(anti) 
and dGTP(anti) opposite Cy is shown in yellow or green, respectively. The 
shift from the adducted 08 is indicated. d, Catalytic efficiency of 8-oxo-dGTP 
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Extending the MgCl, soak time to 40 s resulted in complete turnover 
of the reactant 8-oxo-dGTP(syn) in the closed polymerase conforma- 
tion (Extended Data Table 1 and Fig. 1f). This post-chemistry complex 
shows the Hoogsteen base pairing and the intramolecular hydrogen 
bond at N2 are maintained, and Asn 279 hydrogen bonds with O8. The 
catalytic Mg”~ has been replaced by Na‘, while the nucleotide Mg”* 
remains in the active site coordinating PP; (Extended Data Fig. 1d, e). The 
closed product complex contains a new Mg’ product metal (Mg,) that 
bridges the backbone phosphate of 8-oxo-dGMP(syn) and PP;. 

Extending the MgCl soak to 90 s results in a closed-to-open confor- 
mational change (Fig. 1g). The PP; and associated metals have disassoci- 
ated, and the inserted 8-oxo-dGMP(anti) has lost the interaction between 
Asn 279 and O8, promoting destabilization of the Hoogsteen base pair- 
ing. The inserted 8-oxo-dGMP has a high B-factor (57 A’, Extended Data 
Table 1), with the most stable position displaced into the major groove 
and a weak hydrogen bond formed between N6 of Ad and N3 of 8-oxo- 
dGMP (Extended Data Fig. 1f). 

The insertion efficiency of 8-oxo-dGTP opposite Cy is much less than 
opposite Ad (Fig. 1c); this may arise from a clash between O8 and Pa of 
8-oxo-dGTP(anti) (Fig. 1a)’??*. To probe how this clash is accommo- 
dated, we soaked open binary pol B complex crystals with a templating 
Cy ina cryosolution containing 8-oxo-dGTP/CaCly. The resulting closed 
ternary ground state complex contains 8-oxo-dGTP(anti) Watson—Crick 
base pairing with the templating Cy (Extended Data Table 3 and Fig. 2a). 
The clash at O8 is partly eased by an altered sugar pucker, glycosidic angle, 
buckle, and shear compared with dGTP(anti) (Extended Data Table 2). 
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a 40s soak in MgCl, with the reactant (green) and product (yellow) species 
shown. A close-up without density is in the adjacent panel. F, — F, maps (30) 
are in green. Ca?*, Mg", and Na* are orange, red, and purple spheres, 
respectively. The catalytic, nucleotide, ground, and product metals are denoted 
with a subscript c, n, g, and p, respectively. 
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The incoming 8-oxo-dGTP(anti) N3 and N2 atoms are within hydrogen- 
bonding distance of Asn 279 and Arg 283, respectively (Fig. 2b). To 
determine whether these are unique contacts, we solved the structure 
of dGTP(anti) Watson-Crick base pairing with Cy in the presence of 
CaCl, (Extended Data Table 3). Comparing these two structures indi- 
cates the contact with Arg 283 may be unique to the incoming 8-oxo- 
dGTP(anti) because O8 causes the triphosphate and base to move 1.1 A 
apart (Fig. 2c). This shift promotes the phosphate backbone of Cy to 
move 2.6 A into the minor groove (Fig. 2b). Mutating Arg 283 to lysine 
or alanine reduced the 8-oxo-dGTP specificity from favouring 8-oxo- 
dGTP(syn) insertion opposite Ad by 40-fold for wild-type, to sixfold 
and twofold for the R283K and R283A mutants, respectively (Fig. 2d). 
This loss of discrimination against insertion of 8-oxo-dGTP(anti) oppo- 
site Cy indicates Arg 283 promotes the mutagenic insertion of 8-oxo- 
dGTP(syn) by acting as a steric gate to prevent insertion opposite Cy. 
Interestingly, during the bypass of 8-oxo-G in the templating position, 
Arg 283 stabilizes the mutagenic syn-conformation”. 

The changes described above do not fully alleviate the clash between 
O8 and the sugar-phosphate backbone of the incoming 8-oxo-dGTP(anti) 
(red dashes in Fig. 2b). Surprisingly, this clash is accommodated by an 
additional divalent metal (Cag) observed coordinating the pro-S, oxygen 
of Pa and five water molecules (Fig. 2a, b). Two of the water molecules 
are also within hydrogen bonding distance of O8 (Fig. 2b). To verify the 
presence of this additional divalent metal-binding site near Pa, we soaked 
the closed pre-catalytic ground state complex (8-oxo-dGTP:Cy) in MnCl, 
for 5s (Extended Data Table 3). This allowed metal exchange before 
catalysis had appreciatively occurred (Extended Data Fig. 2a). The pres- 
ence of Mn" in the catalytic, nucleotide, and ground state sites was 
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Figure 3 | Product complex with 8-oxo-dGMP(anti):cytosine. a, Closed 
8-oxo-dGMP:Cy product complex after a 60s soak in MgCl). b, Active site of 
the closed 8-oxo-dGMP:Ad product complex with coordination distances 

(in angstroms) and waters (blue). c, A 90° rotation relative to a. The clash at O8 
is shown with red dashed lines. d, Overlay of the ground (green) and product 
(yellow) 8-oxo-dGTP(anti):Cy complexes. The distance (in angstréms) 
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verified by anomalous density. Overlaying the ground state complex 
with Ca" and Mn?” verified the ground state metal site binds divalent 
cations (Extended Data Fig. 2b). 

To observe 8-oxo-dGTP(anti) insertion we soaked a ground state 
crystal in MgCl, for 40s (Extended Data Table 3). Pol B remained in 
the closed conformation with density corresponding to both product 
and reactant species (Fig. 2e). Occupancy refinement indicates the reac- 
tion is 40% complete. The Watson-Crick base pairing interactions are 
maintained with only moderate movement at the reacting atoms (Pa and 
O3’; Fig. 2e). The phosphate backbone of the templating Cy is observed 
in two conformations corresponding to reactant and product. Surpris- 
ingly, density for both ground state and product metals can be observed, 
indicating there are two distinct populations within the crystal (Extended 
Data Fig. 3). 

Soaking pre-catalytic complex crystals in MgCl, for 60s results in a 
closed product complex (Extended Data Table 3). 8-Oxo-dGTP(anti) 
has been completely inserted with the sugar pucker shifting from a C4'- 
exo (ground state) to C3’-endo (Extended Data Table 2 and Fig. 3a). 
Importantly, there is no density corresponding to the ground state metal 
and only the product-associated metal is present (Fig. 3a, b). The clash 
at O8 has not been fully alleviated in the product complex and is prob- 
ably being mediated by the product Mg”* (Fig. 3c). Overlaying the closed 
ground (0s) and product states (60 s) indicates that ground state and 
product metal-binding sites are in distinct positions separated by 2.0 A 
that are dependent on having either substrate or product present (Fig. 3d). 
Extending the soak time in MgCl, to 120s resulted in pol f transition- 
ing to an open conformation (Extended Data Table 3). Watson-Crick 
base pairing is lost, 8-oxo-dGMP stacks over the templating base with 
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between the ground and product metal sites is shown with a dashed line. 

e, Open 8-oxo-dGMP:Cy product complex after a 120s soak in MgCl. Closed 
conformation is shown in green (Protein Data Bank accession number 2FMS). 
F, — F, maps (30) are in green. Ca?*, Mg**, and Na* are orange, red, and 
purple spheres, respectively. 
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a high B-factor (65.4 A’), and the remaining PP; and associated metals 
have dissociated (Fig. 3e). 

Quantum mechanical analysis allows the point charge on each atom 
of dGTP(anti) and 8-oxo-dGTP(anti) to be calculated (Extended Data 
Table 4 and Extended Data Fig. 4). Figure 4a illustrates the charge differ- 
ence between 8-oxo-dGTP and dGTP mapped onto each atom of dGTP. 
The adducted O8 causes the oxygen bridging the sugar moiety and tri- 
phosphate (O5) to become more positive. Likewise, the pro-S, oxygen 
of Pa becomes more negative and may facilitate recruitment of the 
ground state metal. We also calculated the charge on each atom of 8-oxo- 
dGTP(anti) with all three metals (Extended Data Table 4 and Extended 
Data Fig. 4). Figure 4b shows the difference in charge of 8-oxo-dGTP(anti) 
with three metals from that with two metals (Ca, and Ca,) mapped onto 
the structure of 8-oxo-dGTP(anti). The largest electronegative change 
is localized on the triphosphate at key catalytic atoms. This includes 
making Po and Pf more positive, while their bridging oxygen becomes 
more negative. 

Molecular dynamics simulations using the pre-catalytic 8-oxo-dGTP 
(anti) or dGTP(anti) opposite Cy structures indicate a stable Mg, coor- 
dination sphere (Extended Data Table 5). With 8-oxo-dGTP(anti), one 
of the Mg,-coordinating water molecules forms a hydrogen bond with 
O8, while the inter-atomic distances in the active site maintain proper 
catalytic values (Extended Data Fig. 5a, b). In comparison, the dGTP(anti) 
system exhibits poor geometry. The average distance of Pa-O3’ increases 
to5.25+1.0A (Extended Data Fig. 5c), and the coordination between 
Mg. and O3’ is broken owing to a newly established coordination net- 
work with another water molecule at 40 ns. All these events in the 
dGTP(anti) system induce larger root mean squared deviations than those 
of the 8-oxo-dGTP (anti) system when the evolving molecular dynamics 
system is compared with the initial structure, indicating that Mg, in the 
dGTP(anti) is less favourable and associated with a much less competent- 
for-chemistry geometry compared with the 8-oxo-dGTP(anti) system 
(Extended Data Fig. 5c, d). 

Interestingly, we captured 8-oxo-dGTP(anti) Watson—Crick base pair- 
ing in a nearly identical manner to that observed for dGTP(anti). This 
required recruitment of an additional divalent metal near Pa of 8-oxo- 
dGTP(anti) that forms a stable hydration shell within hydrogen bonding 
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Figure 4 | Charge modulation of the polymerase active site. a, The charge 
difference for each atom of dGTP(anti) and 8-oxo-dGTP(anti) is plotted onto 
dGTP with a colour key shown. The only value that does not fall within the 
indicated range is C8 (0.4e ). b, The charge difference for each atom of 
8-oxo-dGTP(anti) with three and two metals is mapped onto 8-oxo-dGTP with 
a colour key shown. c, Proposed model for the catalytic mechanism of 
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distance to the adducted oxygen. Consequently, this metal helps alleviate 
the clash at O8 and permits binding of 8-oxo-dGTP(anti) without repo- 
sitioning Pa. A similar phenomena is observed in the 8-oxo-dGTP(syn) 
conformation, where the exocyclic N2 forms an intramolecular hydro- 
gen bond with Pua, stabilizing good nascent base pair geometry. This 
implies that the interaction of Px by either N2 ora divalent metal cation 
is a hallmark characteristic of 8-oxo-dGTP insertion during both muta- 
genic and non-mutagenic insertion, respectively. 

Damaged substrates complicate the ability of DNA polymerases to 
select the correct nucleotide”. Structural studies have identified that 
correct and incorrect non-damaged nucleotides are discriminated from 
each other on the basis of proper alignment of catalytic atoms. The planar 
nature of 8-oxo-dGTP places O3’ of the primer terminus near Pa so 
that only minor structural rearrangements are needed for nucleotidyl 
transfer. Insertion ofa correct nucleotide results in a stable ternary pro- 
duct complex, while incorrect insertion promotes rapid re-opening of 
the enzyme”. Immediately following 8-oxo-dGTP insertion the poly- 
merase reopened, similar to an incorrect insertion and implying 8-oxo- 
dGMP promotes instability. These findings show that 8-oxo-dGTP utilizes 
characteristics of both correct and incorrect insertion elements. 

Recent structural studies have identified a transient third metal- 
binding site associated with the products**”’. For pol B, this third metal 
(Mg,) was only observed in the product complex following insertion of 
the correct, but not incorrect, nucleotide. The appearance of the pro- 
duct metal was unexpected following 8-oxo-dGTP(syn) misinsertion 
opposite Ad, but consistent with the good base pair geometry exhibited 
by this mispair. It appears that these adjunct metal sites are necessary 
to neutralize negative charge that may be inherent in the substrate or 
that transiently develops during chemistry. 

We can infer a mechanistic model for the role of these adjunct metal 
ions during catalysis (Fig. 4c). In the pre-catalytic ground state the primer 
terminus, Mg., Mg,,, and incoming nucleotide are bound. De-protonation 
of 03’ initiates nucleophilic attack at Pa and as O3’ approaches Po it 
sterically clashes with the non-bridging oxygens of Pa. This results ina 
transition state and localized charges on O3’, Po, and Op, that recruit a 
metal ion to polarize Pa, thus facilitating 03’ attack by making Po more 
positive and Op,g more negative. Such a role has been postulated for a 
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nucleotide insertion. The primer terminus is shown in grey with O3’ in red and 
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(Mg.), nucleotide (Mg,,), and transition (Mg,) metals. The transition metal is in 
the same location as the previous ground-state metal-binding site. The localized 
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basic side chain in A- and B-family DNA polymerases”*. The accumu- 
lating negative charge on Op,,-p also promotes protonation of PP;, which 
has been proposed to be a key rate-limiting step’. Following product 
formation this metal could transfer to the nearby product metal-binding 
site, while the catalytic metal rapidly dissociates owing to the loss of a 
coordinating ligand (O3’). Accordingly, the appearance/disappearance 
of cations around the active site represents an elegant ballet of electrons 
during DNA synthesis. 

The structures captured here reveal how 8-oxo-dGTP escapes gen- 
eral polymerase discrimination checkpoints by modulating the highly 
charged DNA polymerase active site. Importantly, if 8-oxo-dGTP were 
to be inserted into a single-nucleotide gap, DNA ligase would be respon- 
sible for sealing the nick with a modified base pair. Abnormalities at the 
nick would hasten abortive ligation and stabilize the cytotoxic nick, thus 
increasing the probability for apoptotic cell signalling’’. Similarly, the 
extension efficiency during error-free DNA synthesis from the modi- 
fied base pair would be reduced (that is, DNA synthesis would pause), 
promoting the generation of cytotoxic strand breaks. Therefore, the inser- 
tion and subsequent processing of 8-oxo-G in DNA offers a mechanism 
to manipulate the oxidative DNA damage response as well as target 
cancer cells that have an elevated metabolic rate. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


DNA sequences. To generate the 16-base oligonucleotide the following DNA 
sequences were used for crystallization (coding nucleotide is underlined): template, 
5'-CCG ACA/C GCG CAT CAG C-3’; primer, 5’-GCT GAT GCG C-3’; down- 
stream, 5’-GTC GG-3’. The downstream sequence was 5'-phosphorylated. The 
kinetic studies required extending the downstream and upstream sequences to employ 
a 34-base oligonucleotide DNA substrate. The sequence of the template strand was 
5'-GTA CCC GGG GAT CCG TAC A/CGC GCA TCA GCT GCA G-3’. The under- 
lined A/C represents the coding nucleotide as either an Ad or Cy. DNA substrates 
for single-nucleotide gap filling DNA synthesis measurements were prepared by 
annealing three purified oligonucleotides. Each oligonucleotide was suspended in 
10 mM Tris-HCl, pH 7.4, and 1 mM EDTA and the concentration was determined 
from their ultraviolet absorbance at 260 nm. The annealing reactions were per- 
formed by incubating a solution of primer with downstream and template oligo- 
nucleotides (1:1.2:1.2 molar ratio, respectively) at 95 °C for 5 min, followed by 65 °C 
for 30 min, and finally cooling 1 °C min~' to 10 °C in a PCR thermocycler. 
Protein expression, crystallization, and structural determination. Human wild- 
type, R283A, and R283K DNA polymerase f were overexpressed in E. coliand puri- 
fied as described previously*'. Binary complex crystals with a templating cytosine or 
adenine ina 1-nucleotide gapped DNA were grown as previously described’*. The 
time-lapse crystallography was performed as previously described and is briefly 
summarized here”®. Binary pol B:DNA complex crystals were first transferred to a 
cryosolution containing 15% ethylene glycol, 50 mM imidazole, pH 7.5, 20% PEG3350, 
90 mM sodium acetate, 3 mM 8-oxo-dGTP or dGTP, and 50 mM CaCl, for 1h. 
These ground state ternary complex crystals were then transferred to a cryosolu- 
tion containing 200 mM MgCl, or MnCl, for varying times. All reactions were 
stopped by freezing the crystals at 100 K before data collection at the home source, 
1.54 A, or the Advanced Photon Source, 1.0 A (Argonne National Laboratory). In- 
house data collection was done on a SATURN92 CCD (charge-coupled device) 
detector system mounted on a MiraMax-007HF rotating anode generator at a wave- 
length of 1.54 A. This allows for anomalous data detection after phasing by mole- 
cular replacement. Remote data collection was done at the Southeast Regional 
Collaborative Access Team BM-22 beamline at the Advanced Photon Source (Argonne 
National Laboratory) at a wavelength of 1.0 A, with the MAR225 area detector. Data 
were processed and scaled using the HKL2000 software package”. Initial models 
were determined using molecular replacement with the open binary (Protein Data 
Bank accession number 3ISB) or closed ternary (Protein Data Bank accession num- 
ber 2FMS) structures of pol B, and all Rgee flags were taken from the starting model. 
Refinement used PHENIX and model building used Coot****. The metal-ligand 
coordination restraints were generated by ReadySet (PHENIX) and not used until 
the final rounds of refinement. Partial catalysis models were generated with both 
the reactant and product species, and occupancy refinement was performed. The 
figures were prepared in PyMol and all density maps were generated after perform- 
ing simulated annealing*’. Ramachandran analysis determined that 100% of non- 
glycine residues lie in allowed regions and at least 97% in favoured regions. 
Kinetic characterization. Steady-state kinetic parameters for single-nucleotide gap 
filling reactions with wild-type enzyme were determined by initial velocity mea- 
surements as described previously**. Unless noted otherwise, enzyme activities were 
determined using a standard reaction mixture containing 50 mM Tris-HCl, pH 7.4 
(37 °C), 100 mM KCl, 10 mM MgCh, 1 mM dithiothreitol, 100 1g ml~ ‘bovine serum 
albumin, 10% glycerol, and 200 nM single-nucleotide gapped DNA. Enzyme con- 
centrations and reaction time intervals were chosen so that substrate depletion or 
product inhibition did not influence initial velocity measurements. Owing to the 
low activity of the Arg 283 mutants (alanine and lysine), the catalytic efficiencies 
were determined by single-turnover analysis as described before except that the 
enzyme/DNA ratio was 10 (ref. 37). Reactions were quenched with 0.3 M EDTA 
and mixed with an equal volume of 95% formamide dye. The substrates and pro- 
ducts were separated on 16% denaturing (8 M urea) polyacrylamide gels. Since a 
6-carboxyfluorescein 5’-labelled primer was used in these assays, the substrates 
and products were quantified using a GE Typhoon 8600 phosphorimager in fluores- 
cence mode (532 nm laser, 526 short-pass filter). Kinetic parameters were deter- 
mined by fitting the rate data to a hyperbolic equation. When the observed rates 
could not be saturated owing to weak substrate binding, the data were fitted to an 
alternative form of the equation to extract catalytic efficiency (k-at/Kyp best-fit initial 
slope); kobs = ((Keat/ Ku) X [S])/(1 + ([S]/Km)). The mean and standard error of at 
least two independent determinations are illustrated in plots that highlight sub- 
strate discrimination. 

Quantum mechanical analysis. The three systems are shown in Extended Data Fig. 4 
and described here. They are (1) 8-oxo-GTP(anti) with the Ca,, Ca,, Cag ions, the 
coordinating eight water molecules, and the three acetate ions (mimicking Asp 190, 
192, and 256 of pol B); (2) all atoms in system 1 except for the Ca, and its coordinat- 
ing water molecules; (3) dGTP(anti) with the Ca., Ca,, Ca, ions, the coordinating 


eight water molecules, and the three acetate ions (mimicking Asp 190, 192, and 256 
of pol 8). The base of 8-oxo-GTP was optimized under minimum constraints to 
keep the geometries closer to the crystallographic structure at the B3LYP level of 
theory with the 6-31+g* basis set using the program Gaussian09.D01 (ref. 38). All 
charges were calculated using the ChelpG procedure in Gaussian09.D01 at the 
6-31+ +g** basis set level”. 

Molecular dynamics simulations. Two reference systems were simulated by mole- 
cular dynamics for 80 ns. System 1 was prepared using the entire ground state 8-oxo- 
dGTP(anti):Cy crystal structure (0s). It consisted of the DNA, 8-oxo-dGTP(anti), 
pol B, 302 crystal water molecules, and three Mg*~ ions that replaced the Ca** ions 
in the crystal structure at the catalytic, nucleotide, and ground metal-binding sites. 
System 2 was prepared as system 1, but with 8-oxo-dGTP replaced by dGTP (anti). 
For all structures, missing atoms were added using CHARMM*”“, and the disordered 
residues 1-10 of polymerase were attached by Accelrys Discovery Studio. Both sys- 
tems were solvated with TIP3P water molecules. The smallest image distance between 
the solute and the faces of the periodic cubic cell was set to 12 A. The total number 
of water atoms was 49,089 and total number of atoms 55,521. Neutralizing ions 
(Na‘) and 150 mM NaCl were added to both systems. All of the Na‘ andCl ions 
were placed at least 8 A away from each other, pol B, and the DNA. Both systems 
were minimized with fixed positions for all heavy atoms of pol B and DNA for 
10,000 steps. The equilibration process was started with a 200 ps simulation at 300 K 
using Langevin dynamics, while keeping all the heavy atoms of pol 8 and DNA 
fixed. This was followed by unconstrained minimization consisting of 20,000 steps. 
The systems were then equilibrated for 500 ps at constant pressure and tempera- 
ture. Pressure was maintained at 1 atmosphere using the Langevin piston method". 
The temperature was maintained at 300 K using weakly coupled Langevin dynamics 
of non-hydrogen atoms with a damping coefficient of 10 ps” '. The systems were 
simulated in periodic boundary conditions with full electrostatics computed using 
the particle mesh Ewald method”. Short-range non-bonded terms were evaluated 
at every step using a 12 A cutoff for van der Waals interactions and a smooth switch- 
ing function. The production simulations were performed for 80 ns with a2 fs timestep. 
The minimization, equilibration, and production molecular dynamics simulations 
were performed by the NAMD simulation package** with the CHARMM27 all-atom 
force field***°. The force field parameters for 8-oxo-G were adopted from earlier 
works*”*. The average distances and standard deviations were calculated using 
molecular dynamics trajectories in the 50-80 ns time range. 
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Extended Data Figure 1 | Mutagenic 8-oxo-dGTP insertion opposite of b with the density removed. Coordinating waters (blue) and their distances 
adenine. a, Overlay of the ternary complex for 8-oxo-dGTP(syn):Ad generated —_ (in angstréms) to active site metals are shown. d, The active site following a 40 s 
with Ca** ora dideoxy-terminated primer (Protein Data Bank accession soak is shown with an omit map (3c) for the Mg, and coordinating waters. 
number 3MBY) is shown in yellow and green, respectively (root mean squared __e, The coordination distances (in angstroms) for the Na., Mg,, and Mg, 
deviation of 0.17 A). b, The pol active site is shown with a 2F, — F- map metals are indicated for the closed product complex after a 40s soak. f, The 


contoured at 1.50 after a 20s soak. Key active site residues are indicated and 8-oxo-dGMP(anti):Ad contact between N3 and N6 of 8-oxo-dGMP and Ad 
Mg?" ions are shown as red spheres. The reactant 8-oxo-dGTP and product _ respectively is shown for the open product complex after a 90s soak. 
8-oxo-dGMP are shown in green and yellow respectively. c, A focused view 
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Extended Data Figure 2 | Pre-catalytic ground state with 8-oxo-dGTP and _ the 8-oxo-dGTP(anti) with Ca** and Mn’* is shown in yellow and purple 
templating cytosine after a 5 s soak in MnCl). a, The pre-catalytic pol B active _ respectively. The anomalous density map contoured at 5a for the Mn*" ions is 
site is shown with an omit map (3c). The ground state metal (Mn,) has shown in purple. The Mng coordinating water molecules are shown in blue and 
been removed for clarity. b, The view is a 90° rotation relative toa. An overlay of _ the distances (in angstréms) are indicated. 
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Extended Data Figure 3 | Reaction with 8-oxo-dGTP opposite templating 
cytosine. a, Focused view of the active site following a 40 s soak is shown with 
key residues indicated; density has been removed for clarity (see Fig. 2e for 
density). b, An omit map (3<) for Cag is shown. Coordinating waters are shown 
in blue (distances in angstroms). c, An omit map (3c) for Mg, is shown. 
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Extended Data Figure 4 | Quantum mechanical computational models for 
8-oxo-dGTP(anti) and dGTP(anti). The models used for the quantum 
mechanical computational studies with the calcium ions, oxygen, phosphates, 
carbon, nitrogen, and protons shown in green, red, orange, grey, blue, and 
white, respectively. The key atoms and Asp 190, Asp 192, and Asp 256 mimics 
are indicated. a, The 8-oxo-dGTP(anti) with three calcium ions and eight water 
molecules. b, The 8-oxo-dGTP(anti) with two calcium ions and three water 
molecules. c, The dGTP(anti) with two calcium ions and three water molecules. 
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Extended Data Figure 5 | Molecular dynamics simulation analysis of 
8-oxo-dGTP(anti) and dGTP(anti) opposite Cy. a, The 8-oxo-dGTP(anti) 
opposite Cy at 80 ns superimposed upon the initial structure. A multicolour 
code based on atom type is used for the final molecular dynamics structure, 
whereas the reference initial structure is shown in light grey. The catalytic 
(Mg.), nucleotide (Mg,,), and ground (Mg,) magnesium metal ions are 
shown in green, and average distances over the course of the simulation are 
indicated for Pa-O3’ and Mg,-O3’. b, Distance distributions between 
hydrogen atoms in the water shell and O8 in the 8-oxo-dGTP(anti):Cy 
simulation. A snapshot of the 8-oxo-dGTP, Mg,, and water shell (W(g1-g5)) is 
plotted at top. Black and red dotted lines indicate Mg, coordination and a 
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hydrogen-bonding interaction between a water molecule and O8, respectively. 
Four of the five water molecules in the water shell (W(g1-g4)) contribute to 
hydrogen-bonding interactions with O8. Blue and orange lines indicate 
distances between hydrogen atoms in each water molecule and O8. The red 
line in the bottom plot indicates the minimum distance between hydrogen 
atoms in the water shell and O8. c, The dGTP(anti) opposite Cy at 80 ns 
superimposed upon the initial structure (grey). Distances and ion labelling are 
as for a. d, Root mean squared deviation of the evolving molecular dynamics 
structure for the entire polymerase/DNA complex (top) and for the active site 
only (bottom), with respect to the crystal structure. 
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Extended Data Table 1 


Data collection and refinement statistics of pol B 8-oxo-dGTP insertion opposite adenine 


Os 20s 40s 90s 
Ground state Reactant state Product state Product state 

Templating Base Adenine Adenine Adenine Adenine 
Data Collection 
Wavelength 1.00 1.00 1.00 1.00 
Space group P2, P2, P2; P2, 
Cell dimensions 

a, b, c (A) 50.9,79.9,55.5 50.8,79.9,55.4 50.9,80.4,55.4 55.1,80.8,55.5 

a,b, g (°) 90,107.6,90 90,107.6,90 90,107.7,90 90,109.6,90 
Resolution (A) 50.0-1.90 50.0-1.88 50.0-1.98 50.0-2.35 
Reym OF Rmerge (%) 6.8 (52.8) §.5 (34.2) 6.0 (28.4) §.2 (33.4) 
Isl 28.9 (4.4) 23.3 (2.3) 24.2 (2.8) 24.3 (2.3) 
Completeness (%) 98.0 (97.7) 95.6 (68.9) 96.6 (72.1) 97.3 (77.0) 
Redundancy 5.7 (4.0) 3.3 (1.8) 3.5 (2.3) 3.5 (2.5) 
Refinement 
Resolution (A) 1.90 1.88 1.98 2.35 
No. reflections 57079 56411 51562 34952 
Ruri Riree 17.5/23.2 17.4/21.4 17.7/23.9 20.8/26.4 
No. atoms 

Protein 2677 2673 2673 2593 

DNA 659 681 681 681 

Water 341 278 274 86 
B-factors (A?) 

Protein 25.9 29.2 26.2 40.5 

DNA/80xo/PP, 26.2/18.7/- 36.5/18.2/25.1 35.0/20.9/28.1 38.2/57/- 

Water 27.8 33.1 31.57 30.4 
R.m.s deviations 

Bond length (A) 0.01 0.01 0.01 0.01 

Bond angles (°) 1.20 1.00 1.09 1.16 
Reaction Ratio 
Pol 6 conformation closed closed closed open 
Ratio of RS/PS* 1.0/0 0.4/0.6 0/1.0 0/1.0 
Occupancy 

Metal C/N/G/P* 1.0/1.0/-/- 1.0/1.0/-/- -/1.0/-/0.7 - 

PP, - 0.6 1.0 - 
PDB ID 4UAW 4UAZ AUAY 4UB1 


* Highest resolution shell is shown in parentheses. 
+RS and PS represent the reactant state and product states, respectively. 
{Metal C/N/G/P refers to the catalytic, nucleotide, ground, and product metal-binding sites, respectively. 
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Extended Data Table 2 | Key DNA and structural parameters near the pol f active site* 


Complex 


dC-dGTP 
dC-—80xodGTP 
dA-80x0dGTP 

dG-dATPS 


dA-80xodGTP!! 


* Parameters extracted with 3DNA. 

+ Primer terminus. 

+ Nascent base pair. 

§ Protein Data Bank accession number 4LVS. 


II Protein Data Bank accession number 3MBY. 


Primer’ Incomin 
C3’-endo C3’-endo 
C3’-endo C4’-exo 
C3’-endo C3’-endo 
C2’-endo C3’-endo 
C2’-endo C4’-exo 
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Extended Data Table 3 


Data collection and refinement statistics of pol B insertion opposite cytosine with 8-oxo-dGTP and dGTP 


Os 40s 60s 120s 5s (MnCl) 0 s (dGTP) 
Ground state _ Reactant state _ Product State Product state Ground state Ground state 

Templating Base Cytosine Cytosine Cytosine Cytosine Cytosine Cytosine 
Data Collection 
Wavelength 1.54 1.00 1.54 1.54 1.54 1.54 
Space group P2, P2, P2, P2, P2, P2, 
Cell dimensions 

a, b, c (A) 50.7,79.8,55.5  50.8,79.8,55.6 50.7,80.1,55.4 55.1,79.6,55.7 55.1,77.7,55.1 50.6,79.4,55.5 

a,b, g (°) 90,107.5,90 90,107.6,90 90,107.8,90 90,109.7,90 90,114.2,90 90,107.5,90 
Resolution (A) 50.0-1.96 50.0-1.85 50.0-2.06 50.0-2.51 §0.0-2.15 50.0-1.96 
Reym OF Rmerge (%) 6.0 (42.1) 5.4 (27.5) 8.3 (51.1) 7.6 (62.3) 6.9 (51.6) 6.8 (51.3) 
Ils] 20.4 (2.1) 20.9 (2.9) 16.3 (2.1) 17.6 (2.2) 13.3 (2.3) 19.6 (2.2) 
Completeness (%) 98.4 (87.7) 98.1 (79.0) 97.4 (91.2) 99.5 (99.7) 99.6 (99.5) 99.5 (95.8) 
Redundancy 3.0 (1.7) 3.2 (1.9) 4.7 (3.3) 3.6 (3.5) 3.2 (2.9) 4.8 (2.3) 
Refinement 
Resolution (A) 2.0 1.90 2.06 2.51 2.15 1.96 
No. reflections §1032 63744 21786 25634 39111 49693 
Ruorks Riree 18.2/25.4 17.6/22.3 19.1/25.2 20.5/27.5 21.2/27.4 17.9/22.8 
No. atoms 

Protein 2673 2673 2673 2593 2673 2674 

DNA 627 661 661 661 627 627 

Water 304 293 172 63 101 235 
B-factors (A’) 

Protein 27.7 28.8 32.3 38.9 42.1 36.1 

DNA/80xo/PP, 36.6/18.8/- 38.0/21.0/20.5  41.4/25.8/28.1 34.8/65.4/- 44.1/38.3/- 42.6/25.0/- 

Water 29.4 39.5 30.0 20.0 31.4 32.8 
R.m.s deviations 

Bond length (A) 0.01 0.01 0.006 0.01 0.007 0.009 

Bond angles (°) 1.17 1.08 1.20 1.26 1.1 1.17 
Reaction Ratio 
Pol B closed closed closed open closed closed 
conformation 
Ratio of RS/PS' 1.0/0 0.6/0.4 0/1.0 0/1.0 1.0/0 1.0/0 
Occupancy 

Metal C/N/G/P* 1.0/1.0/1.0/- 1.0/1.0/0.6/0.4 = -/1.0/-/0.6 - 1.0/1.0/1.0/- 1.0/1.0/-/- 

PP, - 0.4 1.0 . - - 
PDB ID 4UBC 4UBB 4UB3 4UB2 4UB5 4UB4 


* Highest resolution shell is shown in parentheses. 
+RS and PS represent the reactant state and product states, respectively. 
t Metal C/N/G/P refers to the catalytic, nucleotide, ground, and product metal-binding sites, respectively. 
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Extended Data Table 4 | Quantum mechanical point charges for each atom of 8-oxo-dGTP(anti) and dGTP(anti) with either two or three 
calcium ions 


; 8-oxodGTP 8-oxodGTP dGTP 8-oxodGTP 
Number Atom 3 metals 2 metals 2 metals MD 
1 H 0.41 0.39 0.4 N/A 
2 O -0.67 -0.74 -0.68 -0.9 
3 P 1.10 
4 ie) -0.9 
5 O -0.9 
6 O -0.86 
7 P 1.50 
9 -0.82 
10 -0.74 
11 15 
12 -0.82 
13 2 -0.82 
14 -0.62 
15 Cc -0.08 
16 H 0.09 
17 H 0.09 
18 c 0.16 
19 H 0.09 
20 0.14 
21 H 0.09 
22 O -0.66 
23 H 0.43 
24 Cc -0.18 
25 0.09 
26 H 0.09 
27 0.16 
28 H 0.09 
29 O -0.5 
30 N 0.16 
31 Cc 0.41 
32t O/H -0.64 
33 N -0.34 
34+ H 0.38 
35 Cc 0 
36 C 0.33 
37 O -0.58 
38 N -0.23 
39 0.32 
40 c 0.57 
4 N -0.87 
42 H 0.41 
43 H 0.42 
44 -0.57 
45 Cc 0.23 
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The molecular dynamics point charges used with three magnesium ions are shown for reference in the last column. The units are in electron charge (e) and the key is shown for reference. 
*The position of each atom is shown in the chemical structure cartoon below the table. 

+ The oxygen or proton corresponds to 8-oxo-dGTP and dGTP respectively. 

¢ Proton corresponds to 8-oxo-dGTP only. 
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Extended Data Table 5 | Average distances in the active sites in 8-oxo-dGTP(anti) and dGTP(anti) 


Distance 8-oxodGTP (A) dGTP (A) 


Mg.— OD2 (ASP 190) 1.81 +0.04 1.81 +0.04 
Mg.-OD1 (ASP192) 1.80 + 0.04 1.81 +0.04 
Mg.— OD2 (ASP256) 1.80 + 0.04 1.81 + 0.04 
Mg_- O1A (Pa) 3.18 + 0.13 3.34 + 0.25 
Mg- 03’ (C10) 2.19 + 0.13 4.78 + 0.42 
Mg.— W(c) 1.95 + 0.05 2.00 + 0.07 
Mg.- W(c)* N/A 1.98 + 0.07 
Pa — 03' (C10) 3.67 + 0.13 5.25 + 1.01 


Mg,—- OD1 (ASP190) 1.86 + 0.05 1.86 + 0.05 
Mg,- OD2 (ASP192) 1.87 + 0.05 1.88 + 0.05 
Mg,- O1A (Pa) 1.90 + 0.06 1.92 + 0.06 
Mg,— 02B (PB) 1.90 + 0.06 1.91 + 0.06 
Mg, 03G (Py) 1.85 + 0.05 1.84 + 0.05 
Mg, W(n) 2.00 + 0.06 2.03 + 0.07 


Mg,- ©2A (Pa) 1.82 + 0.04 1.82 + 0.04 
Mg,- W(g1) 1.98 + 0.06 1.99 + 0.06 
Mg,- W(g2) 1.98 + 0.06 1.99 + 0.06 
Mg,- W(g3) 2.00 + 0.07 1.99 + 0.06 
Mg,- W(g4) 2.00 + 0.07 2.01 + 0.07 
Mg,- W(g5) 1.99 + 0.07 1.98 + 0.06 


W(g) — 08 (8-oxodGTP) 1.73 +0.11 


W(c), W(n), W(g1-g5) are water molecules bound to the Mg., Mg, and Mgg, respectively. The Mg, in the dGTP(anti) system establishes a new coordination with a water molecular (W(c)*) after 40 ns. The average 
values are calculated using 50-80 ns range of molecular dynamics trajectories. 
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Structural insight into autoinhibition and histone 
H3-induced activation of DNMT3A 


Xue Guo'*, Ling Wang!*, Jie Li’, Zhanyu Ding’, Jianxiong Xiao', Xiaotong Yin', Shuang He’, Pan Shi+*°, Liping Dong’, 
Guohong Li’, Changlin Tian**°, Jiawei Wang’, Yao Cong? & Yanhui Xu’? 


DNA methylation is an important epigenetic modification that is 
essential for various developmental processes through regulating gene 
expression, genomic imprinting, and epigenetic inheritance’. Mam- 
malian genomic DNA methylation is established during embryogen- 
esis by denovo DNA methyltransferases, DNMT3A and DNMT3B°%, 
and the methylation patterns vary with developmental stages and cell 
types’. DNA methyltransferase 3-like protein (DNMT3L) is a cat- 
alytically inactive paralogue of DNMT3 enzymes, which stimulates 
the enzymatic activity of Dnmt3a”’. Recent studies have established 
a connection between DNA methylation and histone modifications, 
and revealed a histone-guided mechanism for the establishment of 
DNA methylation’. The ATRX-DNMT3-DNMT3L (ADD) domain 
of Dnmt3a recognizes unmethylated histone H3 (H3K4me0)'*”’. 
The histone H3 tail stimulates the enzymatic activity of Dnmt3a in 
vitro’”"*, whereas the molecular mechanism remains elusive. Here 
we show that DNMT3A exists in an autoinhibitory form and that 
the histone H3 tail stimulates its activity ina DNMT3L-independent 
manner. We determine the crystal structures of DNMT3A-DNMT3L 
(autoinhibitory form) and DNMT3A-DNMT3L-H3 (active form) 
complexes at 3.82 and 2.90 A resolution, respectively. Structural and 
biochemical analyses indicate that the ADD domain of DNMT3A 
interacts with and inhibits enzymatic activity of the catalytic domain 
(CD) through blocking its DNA-binding affinity. Histone H3 (but not 
H3K4me3) disrupts ADD-CD interaction, induces a large movement 
of the ADD domain, and thus releases the autoinhibition of DNMT3A. 
The finding adds another layer of regulation of DNA methylation to 
ensure that the enzyme is mainly activated at proper targeting loci 
when unmethylated H3K4 is present, and strongly supports a nega- 
tive correlation between H3K4me3 and DNA methylation across the 
mammalian genome”’”'?”°. Our study provides a new insight into an 
unexpected autoinhibition and histone H3-induced activation of the 
de novo DNA methyltransferase after its initial genomic positioning. 

To investigate how DNMT3<A activity is regulated, we performed an 
in vitro DNA methylation assay using recombinant DNMT3A2 (residues 
224-912), a catalytically active variant of DNMT3A consisting of Pro- 
Trp-Trp-Pro (PWWP), ADD, and CD domains (Fig. 1a)*?. H3K4me0 
(but not H3K4me3) peptide significantly stimulated the enzymatic acti- 
vities of DNMT3A2 or DNMT3A2 in complex with the CD-like domain 
of DNMT3L (designated CPNMIEE) (Fig. 1b and Extended Data Fig. 1a, b). 
A similar effect was observed for DNMT3A2 or DNMT3A2-C?§M Et 
using recombinant poly-nucleosomes carrying unmethylated histone 
H3 or H3K4me3 mimic (H3K-4me3) (Fig. 1b and Extended Data Fig. 1c). 
Thus, the enzymatic activity of DNMT3<A2 is stimulated by histone H3 
(but not H3K4me3) tail either in the form of free peptide, or within nucle- 
osome, in a DNMT3L-independent manner. 


We next characterized the critical regions for histone H3-induced 
activation of DNMT3A. H3K4me0 peptide activated DNMT3A2 and 
ADD-CD (residues 476-912) proteins with comparable (approximately 
sixfold) activity enhancement, but did not stimulate activity of the CD 
domain (residues 627-912) in the absence or presence of cot 
(Extended Data Fig. 1d, e). ADD-CD protein with histone H3 peptide 
(residues 1-20) fused at the amino (N) terminus (H3-ADD-CD) showed 
significantly higher activity than ADD-CD, and could not be further 
activated by H3K4me0 peptide (Extended Data Fig. le). These results 
collectively indicate that the ADD domain (but not the PWWP domain) 
is required for histone H3-induced activation of DNMT3A. 

Notably, the CD domain or CD-C?N™'?" complex showed compa- 
rable activity to that of ADD-CD or ADD-CD-C?\™™ activated by 
histone H3 peptide (Extended Data Fig. 1d, e), suggesting that the ADD 
domain inhibit the activity of the CD domain and histone H3 release 
this inhibition. In support of this hypothesis, addition of ADD-linker 
(residues 476-626, Fig. 1a), but not H3-ADD-linker fusion protein inhib- 
ited the activity of the CD domain (Fig. 1c). These results confirm the 
existence of autoinhibition of DNMT3A, in which ADD-linker directly 
inhibits the enzymatic activity of the CD domain, and that this inhibi- 
tion is released by histone H3 tail. 

We determined the crystal structure of ADD-CD of DNMT3A in com- 
plex with CONMTE (ADD-CD-CONMT1) at 3.82 A resolution (Fig. 1d 
and Extended Data Table 1). Although only one ADD-CD-C?NM*" 
complex was observed within an asymmetric unit of the crystals, gel- 
filtration analysis indicated a tetramer formation in solution. Two ADD- 
CD-CPNMT*" complexes form dimer of dimers via the CD-CD inter- 
action in a two-fold crystallographic symmetry (Extended Data Fig. 2). 
The complex structure adopts an ‘X’ shape with two CD domains located 
in the centre and the ADD and C?N™'*" domains located in the four 
corners. In the ADD-CD-CP®M™= structure, CD-CONMT3" adopts a 
similar fold to that of mouse CDP?™32_CP"™°F structure (Extended 
Data Fig. 3a)”. 

The ADD and CD domains fold into two individual structural mod- 
ules connected by the linker (Fig. 1d). The linker forms a twisted helix 
followed by an extended loop, which packs against a hydrophobic surface 
of the CD domain (Extended Data Fig. 3b, c). A loop region (residues 
526-533) extends out of the ADD domain and inserts into a pocket 
in the CD domain (Fig. le and Extended Data Fig. 3d). This pocket is 
mainly formed by basic residues R790, R792, H789, and R831 of the 
CD domain. Three acidic residues (D529/D530/D531) and hydropho- 
bic residues (Y526/Y528/Y533) of the ADD domain are brought into 
close proximity to the pocket. All residues that are involved in the intra- 
molecular interaction are highly conserved in DNMT3A/3B, suggest- 
ing a conserved ADD-CD interface among DNMT3A/B sub-family 
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Figure 1 | Structure of the DNMT3A-DNMT3L complex in autoinhibitory 
form. a, Colour-coded domain architecture of human DNMT3A and 
DNMT3L. b, Relative activities of DNMT3A2 using peptides or reconstituted 
nucleosomes as substrates in the absence or presence of CPNMTSL ©, Activities 
of the CD domain in the absence or presence of ADD-linker or H3-ADD- 
linker fusion proteins. d, Ribbon representations of the overall structure of the 
ADD-CD-C?M7T?" complex. AdoHcy are shown in stick representation; 
zinc cations are shown as grey balls. The colour scheme is the same as in a and 
used in all structural figures. e, Close-up view of the ADD-CD interface. 
Critical residues for the interactions are shown in stick representation. Relative 
activities were calculated according to basal activity for each protein in the 
absence of H3 peptide. Error bars, s.d. for triplicate experiments. 


members, but not DNMT3L (Extended Data Fig. 4). Structural com- 
parison also indicated that ADD?N™T?” would have steric hindrance 
with CPNMT" if DNMT3L adopts a similar conformation to that of 
DNMT3A (Extended Data Fig. 3e)">. 

The intramolecular interaction was verified in the in vitro glutathione 
S-transferase (GST) pull-down assays, in which ADD-linker bound to 
the CD domain (Fig. 2a and Extended Data Fig. 5a—e). Mutations D529A, 
D531A, D529A/D531A, and Y526A/Y528A of ADD-linker, decreased 
the binding affinity to the CD domain, supporting their important role 
in mediating the ADD-CD interaction. In contrast, mutations R556A, 
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M548W, D530A, and Y533A of ADD-linker showed little effect on ADD- 
CD interaction (Fig. 2a and Extended Data Fig. 5a, b). Mutation R790A/ 
R792A of the CD domain also impaired its binding affinity to the ADD- 
linker protein (Extended Data Fig. 5c). H3K4me0 (but not H3K4me3) 
peptide hampered ADD-CD interaction, suggesting that histone H3 
tail releases autoinhibition of DNMT3A through disrupting this intra- 
molecular interaction (Fig. 2b and Extended Data Fig. 5d). 

We next mapped the critical regions for the autoinhibition of DNMT3A 
in the presence of Coit (Fig. 2c). Compared with ADD-CD (Del- 
N476), Del-N522 showed a slight increase in enzymatic activity, whereas 
Del-N550, Del-N587, and Del-N610 significantly increased their activ- 
ities (Fig. 2d). Although ADD and ADD-linker showed comparable bind- 
ing affinity to the CD domain (Extended Data Fig. 5e), only ADD-linker 
(ADD-626) maintained the inhibitory function, while all other truncated 
proteins markedly decreased their inhibition on the activity of the CD 
domain (Fig. 2e). Replacement of residues 621-632 by a 12-residue Gly 
and Ser (GS) linker in ADD-CD partly released autoinhibition and showed 
less than twofold activity enhancement upon histone H3 stimulation 
(Extended Data Fig. 5f). The results indicate that the ADD domain and 
linker are both important for the autoinhibition of DNMT3A. 

We next tested whether residues on the ADD-CD interface are impor- 
tant for the autoinhibition of DNMT3A. Compared with wild-type pro- 
tein, mutations D529A, D531A, and Y526A/Y528A of the ADD-linker 
could barely inhibit the activity of the CD domain (Fig. 2f and Ex- 
tended Data Table 2). Mutations D529A, D531A, and Y526A/Y528A of 
DNMT3A2 showed enzymatic activities that are comparable to or slightly 
lower than that of corresponding mutants activated by histone H3 pep- 
tide (Fig. 2g). A more obvious release of autoinhibition was observed in 
assays using recombinant poly-nucleosomes as substrate (Fig. 2h). As a 
negative control, mutation M548W of DNMT3A2 remained in an inhib- 
itory form and was not activated by histone H3 tail because the mutant 
could not bind to histone H3 (ref. 18). Taken together, the ADD domain 
and linker function together to inhibit the activity of the CD domain, 
and ADD-CD association is essential for ADD-mediated autoinhibition 
of DNMT3A. 

Hhal is a well-characterized bacterial DNA (cytosine-5) methyltrans- 
ferase”’. Structural comparison of ADD-CD-C?\™™*" and Hhal-DNA 
(Protein Data Bank (PDB) accession number 1MHT) complexes indi- 
cates that their catalytic domains adopt similar fold, and suggests that 
loops L1 and L2 of DNMT3A are potential regions for DNA interac- 
tion (Fig. 3a—c and Extended Data Fig. 4). Consistently, mutating basic 
residues K831, R836/N838, K841, and K844 on loop L2 impaired the 
enzymatic activity of DNMT3A (Fig. 3c and Extended Data Fig. 6a)”*. 
In the ADD-CD-CP™™*! structure, the ADD domain is located close 
to the loop L2 and may thus generate steric hindrance for DNA inter- 
action (Fig. 3c). 

In support of the analysis above, ADD-CD markedly decreased, but 
H3-ADD-CD showed comparable DNA-binding affinity to that of the 
CD domain (Fig. 3d). Histone H3 peptide partly restored DNA-binding 
affinity of ADD-CD or DNMT3A2, which may have resulted from 
relative weak interaction between histone H3 peptide and DNMT3A2 
(Fig. 3d, e and Extended Data Fig. 6b). Consistent with their effect on 
the release of autoinhibition, mutations D529A, D531A, and Y526A/ 
Y528A of DNMT3A2-C°™™* partly rescued the DNA interaction 
(Fig. 3e), and histone H3 (but not H3K4me3) peptide increased the com- 
plex formation of ADD-CD-DNA, but not CD-DNA (Fig. 3f). Similar 
results were observed in electrophoretic mobility-shift assays (Extended 
Data Fig. 6c). Taken together, the ADD domain inhibits the activity of 
the CD domain by decreasing its DNA-binding affinity. This inhibition 
is restored by fusion of the histone H3 tail at the N terminus of ADD-CD, 
and is partly restored by addition of histone H3 peptide or mutations 
of residues on the ADD-CD interface. 

We also determined the crystal structure of ADD-CD-C?\™™® in 
complex with histone H3 peptide (residues 1-12) at 2.90 A resolution 
(Extended Data Table 1). The complex structure adopts a butterfly shape, 
with the ADD and CON"?! domains resembling the wings (Fig. 4a). In 
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the complex structure, two ADD-CD-C?\™*".3 complexes forma 
dimer of dimers through CD-CD interface in a pseudo-two-fold s e- 
try. CD-CD?\™'" adopts a similar fold to that of ADD-CD-C?N“T" 
(autoinhibitory form) and mouse CpPnmtsa_CPnmtsL structures, whereas 
the position of the ADD domain is obviously different in the structures 
of DNMT3A in autoinhibitory and active forms (Fig. 4a, b and Extended 
Data Fig. 7a, b). 

In the ADD-CD-C?™™'?"_ 3 structure, the ADD-CD interaction is 
mediated by a network of hydrogen bonds and hydrophobic interactions 
(Extended Data Fig. 7c, d). A similar domain organization was observed 
in the DNMT3L-H3 structure’’, suggesting a conserved ADD-CD 
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Figure 2 | ADD-CD interactions are important 


476 609 627 

for autoinhibition of DNMT3A. a, GST 
ADD-609 476 609 pull-down assays for the ADD-CD interactions. 
ADD-614 476 614 Recombinant CD (residues 627-912) proteins were 
ADD-619 476 619 incubated with wild-type or mutant GST-ADD- 
ADD-623 476 623 linker proteins immobilized on glutathione resin. 
ADD-626 476 626 


The bound proteins were analysed using SDS- 
polyacrylamide gel electrophoresis (SDS-PAGE) 
and Coomassie blue staining (Extended Data 

Fig. 5a). The assays were quantified by band 
densitometry. Error bars, s.d. for triplicate 
experiments. b, GST pull-down assays in the 
absence or presence of histone H3 peptide 
(H3K4me0 or H3K4me3) as indicated in Extended 
Data Fig. 5d. c, Schematic representation of 
DNMTS3A proteins used for in vitro 
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See PF methyltransferase activity assays. d, Effect of 
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enzymatic activity. e, Activities of the CD domain 
measured in the presence of various carboxy 
(C)-terminal deletions of ADD-linker proteins. 

f, Activities of the CD domain measured in the 
presence of wild-type and mutant ADD-linker 
proteins. g, h, Activities of wild-type or 

mutant DNMT3A2 measured using naked 

DNA (g) or poly-nucleosomes (h) as substrate. 
Error bars, s.d. for triplicate experiments. 
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interface among DNMT3 family members when they adopt the active 
form (Extended Data Fig. 7e). The histone H3 peptide binds to the 
ADD domain by forming a three-stranded anti-parallel B-sheet with 
two B-strands of the ADD domain (Extended Data Fig. 7f). The side chain 
of H3K4 is stabilized by residues D529 and D531 of the ADD domain, 
and tri-methylation of H3K4 will disrupt the interactions. Mutations 
D529A, D531A, and M548W of ADD-CD abolished their binding affin- 
ity to histone H3 peptide (Extended Data Fig. 7g). 
Comparison of the structures of ADD-CD-COM" in autoinhibitory 
and active forms indicates two separate surfaces on the CD domain for 
ADD-CD interaction, and suggests a conformational change of DNMT3A 


Figure 3 | The ADD domain 
inhibits DNA-binding affinity of 
the catalytic domain. 

a, Superimposition of ADD-CD- 
CPNMTSL and Hhal-DNA complex 
structures shown in ribbon 
representations with the CONMT" 
domain omitted for simplicity. The 
colour scheme for the comparison is 
indicated. b, c, Hhal-DNA (b) and 
ADD-CD (c) structures are shown 
as in a. d, e, Superimposed 
fluorescence polarization plots for 
DNA-binding affinities of 
truncations (d) or mutants (e) of 
DNMTS3A in the absence or 
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Figure 4 | Mechanisms for autoinhibition and histone H3 tail-induced 
activation of DNMT3A. a, Ribbon representations of the overall structure of 
the ADD-CD-C?\MT!_H3 complex. Histone H3 peptides are coloured in 
yellow. b, ADD-CD-C?™™*" structure in autoinhibitory form are shown 
for comparison. c, Ribbon representations of ADD-CD-C?N“*" 
(autoinhibitory form) and ADD-CD-C?%™™?.3 (active form) complex 
structures with the CD domains superimposed. d, Superimposition of 


induced by histone H3 (Fig. 4a—c and Extended Data Fig. 8a). DNMT3A 
may prefer an autoinhibitory form in the absence of histone H3. Upon 
histone H3 association, DNMT3A adopts an active form to open an 
otherwise locked DNA-binding region for establishment of DNA meth- 
ylation. The negative-stain electron microscopy density maps clearly 
showed a distinct shape for DNMT3A2 (residues 275-912)-CPNMTS 
complex in the absence or presence of histone H3 peptide, supporting 
a conformational change of DNMT3A2-C?\™™*" induced by histone 
H3: that is, transforming from an ‘X’ shape (autoinhibitory form) to a 
butterfly shape (active form) (Extended Data Fig. 8b-d). One-dimensional 
‘°F NMR measurement” also indicated a significant change of chemical 
shift (representing conformational change) for residue F827 (within the 
loop L2) upon histone H3 peptide (but not H3K4me3) titration, whereas 
only a slight change was observed for residue F868 (as a negative control) 
(Extended Data Fig. 8e, f). Collectively, both electron microscopy and 
NMR measurements support the existence of conformational change 
of DNMT3A induced by histone H3 tail. 

Superimposition of H3-ADD (PDB accession number 3A1B) and 
ADD-CD-CDXM™" (autoinhibitory form) structures indicated that 
ADD domains in the two structures adopta similar fold, and that histone 
H3 peptide in the H3-ADD structure has no overlap with the ADD- 
CD interface (Fig. 4d). Our previous studies have shown that mutation 
D529A or D531A not only significantly decreased H3-ADD interaction 
(Extended Data Fig. 7g) but also markedly decreased ADD-CD inter- 
action (Fig. 2a), and released the autoinhibition (Fig. 2g, h). Moreover, 
H3K4me0 peptide could disrupt ADD-CD interaction (Fig. 2b). Thus, 
histone H3 releases the autoinhibition of DNMT3A through binding 
to residues of the ADD domain on the ADD-CD interface (such as D529 
and D531) and disrupting the intramolecular interaction. The two resi- 
dues function as a critical switch that can exist in both forms of DNMT3A 
and couple histone H3 recognition to the release of the autoinhibition. 


ADD-CD-CPXM™" (CPNMTSE omitted for simplicity) and H3-ADD 
structures. The ADD domain and histone H3 in the H3-ADD structure are 
coloured in grey and yellow, respectively. Residues for H3-ADD and ADD-CD 
interactions are shown in stick representation. e, A working model for the 
autoinhibition and histone H3 tail-induced activation of DNMT3A. DNMT3L 
and the PWWP domain of DNMT3A are not shown for simplicity. 


Here we propose a working model for histone H3-induced dynamic 
regulation of the de novo DNA methyltransferase (Fig. 4e and Supplemen- 
tary Video 1). DNMT3A exists in dynamic equilibrium between auto- 
inhibitory and active forms, and the ADD domain oscillates between 
the two conformations. In the absence of histone H3, DNMT3A pre- 
fers an autoinhibitory form, in which the ADD domain binds to the CD 
domain and hinders its DNA-binding affinity. Once DNMT3A (or the 
DNMT3A-DNMT3L complex) is recruited to the nucleosome, H3K4me0 
binds to the ADD domain and stimulates DNMT3A to undergo a signi- 
ficant conformational change from an autoinhibitory form to an active 
form. The ADD-CD interaction in the active form allows DNMT3A to 
adopt a relative stable conformation so that DNA methylation occurs 
in a range permitted by the histone H3 tail. Even when DNMT3A is 
recruited to an H3K4me3-containing nucleosome, the enzyme will remain 
in its autoinhibitory form to avoid DNA methylation within an imper- 
missible chromatin environment. Our work reinforces the connection 
between DNA methylation and histone modifications, and sheds new 
light on the fine tuning of the establishment of DNA methylation. The 
structures may also provide a basis for the design of specific regulators 
for potential therapeutic applications. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Protein expression and purification. The full-length DNMT3A2 (residues 224- 
912 of DNMT3A) was expressed in sf9 cells using the Bac-to-Bac system (Invitrogen). 
The infected cells were harvested and lysed in 50 mM Tris-Cl pH 8.0, 500 mM NaCl, 
and 0.01% 2-mercaptoethanol. The clarified lysate was applied onto GST affinity 
columns (GE Healthcare) and the fusion protein was cleaved with PreScission pro- 
tease. The protein was stored in 20 mM Tris-Cl pH 8.0, 300 mM NaCl, and 0.01% 
2-mercaptoethanol for the assays. 

Truncations of DNMT3A2 were cloned into modified pGEX-6p-1 vector and 
the CPNMTSL (residues 178-379) was inserted into modified pRSFDuet-1 vector. 
DNMT3A proteins were expressed independently or with C°\"*" in Escherichia 
coli strain BL21(DE3). The transformants were grown at 37 °C to an attenuance 
(Deo0nm) Of 0.6 in 2X YT medium. The cultures were induced by adding 1 mM 
isopropyl-B-p-thiogalactopyranoside and further incubated for 16 h at 15 °C. The 
supernatant of cell lysate was applied onto GST affinity columns (GE Healthcare) 
and the fusion protein was digested with PreScission protease. The eluted protein 
was purified by ion exchange and gel filtration chromatography. The purified pro- 
teins were subjected to SDS-PAGE and stained by Coomassie blue. The peak fractions 
were concentrated to 5-10 mg ml ' and used for crystallization and biochemical 
assays. Mutations of DNMT3A were purified in a similar procedure. 
Crystallization and structure determination. For crystallization of the autoin- 
hibitory form of DNMT3A, the complex of ADD-CD of DNMT3A (residues 455- 
912) and CPNMTS! was mixed with a palindromic 18-base-pair DNA duplex (5'-G 
AGGCTAGCGCTAGCCTC-3’) and AdoHcy ina 1:1.2:2 molar ratio. The crystals 
were obtained using the hanging-drop, vapour-diffusion method by mixing 1 pl 
ADD-CD-C?X™™" complex with 1 ll reservoir solution containing 0.05 M Bis- 
Tris pH 5.6-6.0, 0.1 M sodium malonate and 8% PEG3350 at 4°C. Crystals were 
cryoprotected by the reservoir buffer containing 22% glycerol and then flash frozen 
in liquid nitrogen. Although the DNA duplex was added for the crystallization, no 
corresponding electron density was observed. The DNA duplex may function as an 
additive to favour the crystallization. 

For the crystallization of the active form of DNMT3A, the complex of ADD-CD 
of DNMT3A (residues 476-912) and CPNM™! was pre-incubated with histone H3 
peptide (residues 1-12, ARTKQTARKSTG) at a 1:10 molar ratio before crystalli- 
zation. Crystals were grown by the hanging-drop, vapour-diffusion method by 
mixing 1 pl protein (10 mg ml) with 1 pl reservoir solution containing 100 mM 
sodium acetate (pH 5.3-5.6) and 600 mM ammonium sulphate at 18 °C. Crystals 
were cryoprotected by the reservoir buffer with 25% ethylene glycol and then flash 
frozen in liquid nitrogen. 

The data were collected on beamline BL17U at Shanghai Synchrotron Radiation 
Facility, China, at wavelengths of 1.2816 A and 0.9792 A, respectively. Data were 
indexed, integrated, and scaled using the program HKL2000 (ref. 26). The orien- 
tation and position of CD-C°\™™*" in the ADD-CD-C?N™™" complex was first 
determined by molecular replacement using CD-C°\™™*" (PDB accession num- 
ber 2QRV)” as a searching model in the PHASER program’. The resulting CD- 
CPNMTS!: model was refined with PHENIX package”*. The ADD domain then was 
put into the refined extra-difference density using MOLREP with the ‘search for 
model in the map’ module”. The overall structure of the ADD-CD-C?™*" com- 
plex was finally refined with stereochemistry and the reference structure ADD domain 
(PDB accession number 3A1B)’* as restraints. The anomalous Fourier map of zinc 
cations in the ADD domain confirmed the validity of the position of the ADD domain. 
The structure of the ADD-CD-C?N™™*".3 complex was determined by molecu- 
lar replacement using the ADD domain (PDB accession number 3A1B)'* and CD- 
CPNMTSL (PDB accession number 2QRV)” as searching models in the PHASER 
program”, and was then manually built by COOT”. 

All refinements used the module phenix.refine of PHENIX”*. The model quality 
was checked with the PROCHECK program*'. In the structure of ADD-CD- 
CPNMTSL 85 7% of residues were in most favoured regions, 12.9% in additional 
allowed regions, 0.9% in generously allowed regions, and 0.5% in disallowed regions. 
In the structure of ADD-CD-CPN™™4_}73, 88.4% of residues were in most favoured 
regions, 11% in additional allowed regions, and 0.6% in generously allowed regions. 
All structure figures were generated by PyMol’’. 

In vitro DNA methylation assay on naked DNA. The enzymatic activity of 
DNMT3A proteins was assessed by incorporation of a *H-labelled methyl group 
from S-adenosyl-L-[methyl-*H]methionine ({methyl-*H] AdoMet, PerkinElmer)”**. 
A biotin-labelled DNA fragment amplified from the EBNA1 region of p220.2 
(1.2 kilobases, 52 CG sites) was used as a substrate. For histone H3 stimulation, 
0.3 4M DNMT3A proteins were pre-incubated with or without 3 1M histone H3 
peptides (residues 1-12). For ADD-mediated inhibition, CD proteins (1 [1M) were 
supplemented with or without the ADD domain proteins (18 tM) on ice for 30 min. 
DNA (100 ng) was methylated by DNMT3A proteins in the presence of 2.5 uM 
[methyl-*H] AdoMet, 25 mM Tris-HCl (pH 7.5), 5% glycerol, 0.01% 2-mercaptoethanol, 
and 0.5 mg ml | BSA. The reactions were incubated at 37 °C for 30 min, and terminated 
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by adding cold wash buffer (500 mM NaCl and 1 mM EDTA in PBST). The DNA 
products were immobilized on streptavidin beads, washed five times, and subjected to 
liquid-scintillation counting (PerkinElmer). Each reaction was performed in triplicate. 
In vitro DNA methylation assay on recombinant poly-nucleosomes. Recombi- 
nant Xenopus laevis histones were expressed and purified as described™. Site-specific 
methylation of H3K-4me3 was performed by the methyl-lysine analogs approach”. 
Incorporation of H3K-4me3 was verified by specific antibodies (anti- H3K4me3: 
Cell Signaling, 9751 s; anti-H3: Abcam, ab1791) and visualized on Tanon-5200 Chemi- 
luminescent Imaging System (Tanon Science & Technology). Assembly of histone 
octamers and reconstitution of poly-nucleosomes were performed by salt dialysis 
using the 601 sequence (30 CG sites)**. The reaction mixture was the same as 
described above except that 300 ng nucleosomal DNA was methylated using 1 1M 
DNMT3A proteins. The reaction was quenched by the addition of excess TE 
buffer and 1% SDS. The histone proteins were removed by phenol-chloroform 
extraction. Then DNA was purified by ethanol precipitation, resolved in TE, and 
subjected to liquid-scintillation counting (PerkinElmer). Each reaction was per- 
formed in triplicate. 

Electrophoretic mobility-shift assay. A 6-carboxy-fluorescein (FAM)-labelled 
double-stranded DNA (dsDNA) containing one CpG site was generated from 
annealing two primers (upper primer, FAM-5’-CTGAATACTACTTGCGCTCT 
CTAACCTGAT-3’; lower primer, 5’-GACTTATGATGAACGCGAGAGATTG 
GACTA-3’). The FAM-labelled DNA was used both in electrophoretic mobility- 
shift assays and fluorescence polarization assays. DNA (25 nM) and the indicated 
amounts of proteins were incubated in reaction buffer containing 20 mM HEPES 
pH 7.5, 100 mM KCl, 8% glycerol, and 0.5 mg ml! BSA for 30 min at 25 °C. The 
samples were subject to a 12% PAGE and analysed by Typhoon FLA 9500 (GE 
Healthcare) image scanning. 

Fluorescence polarization assay. FAM-labelled dsDNA (15 nM) was incubated 
with increasing amounts of DNMT3A proteins for 30 min at 25 °C in reaction buffer 
containing 20 mM HEPES pH 7.5, 100 mM KCl, 8% glycerol, and 0.5 mg ml 'BSA. 
Fluorescence polarization measurements were performed on a Synergy 4 Microplate 
Reader (BioTek) at 25 °C. The bound fractions were calculated as (mP — baseline 
mP)/(maximum mP — baseline mP), in which mP (milli-polarization units) repre- 
sents the fluorescence polarization value. For the peptide stimulation experiment, 
15nM DNA was pre-incubated with 0.2 1M CD-CPNM™® or 1 uM ADD-CD- 
CPNMTSL protein. An increasing amount of histone H3 peptide (H3K4me0 or 
H3K4me3) was then added into the protein-DNA complex. The levels of protein- 
DNA complex formation were measured by fluorescence polarization. Each reac- 
tion was performed in triplicate. The curves were fitted using GraphPad Prism 5. 
Isothermal titration calorimetry. To obtain the binding affinity between ADD- 
CD-CPXM™" and H3 peptide, 0.04 mM ADD-CD-C?N™!" (in cell) in the absence 
or presence of 0.5 mM dsDNA (30 base pairs, upper strand: 5’-CTGAATACTAC 
TTIGCGCTCTCTAACCTGAT-3’) was titrated with 0.5 mM histone H3 peptide 
(residues 1-12) (in syringe) using an iTC200 microcalorimeter (GE Healthcare) at 
18°C. Protein, DNA, and peptide were prepared in a buffer containing 10 mM 
HEPES, pH 8.0, 100 mM NaCl, and 0.5 mM TCEP. To obtain the binding affinity 
between the ADD domain and H3 peptide, 0.05 mM ADD protein (in cell) was 
titrated with 0.5 mM histone H3 peptide (residues 1-12) (in syringe). The data 
were fitted by Origin 7.0 software. 

In vitro binding assay. For the GST pull-down assay, 30 jug DNMT3A-CD pro- 
teins were incubated with 8 1g GST-ADD-linker fusion proteins for 1h at 4°C 
in binding buffer containing 20mM Tris-HCl pH 8.0, 100mM NaCl, 0.01% 
2-mercaptoethanol, 5% glycerol, and 0.1% Triton X-100. GST-ADD-linker pro- 
teins were then immobilized to 25 1l of glutathione resins (GE Healthcare) for 1 h 
at 4°C. After washing three times with binding buffer, bound proteins were sub- 
jected to SDS-PAGE and stained by Coomassie blue. 

For the histone peptide pull-down assay, 1 1g biotinylated histone H3 peptide 
(residues 1-21) was incubated with 30 jg wild-type and mutant ADD-CD-CPSMBL 
proteins for 1 hat 4 °C in binding buffer containing 20 mM Tris-HCl pH 8.0, 250 mM 
NaCl, 0.01% 2-mercaptoethanol, 5% glycerol, and 0.1% Triton X-100. Then 20 pl 
streptavidin beads were added into the mixture and incubated 1h at 4°C. After 
washing three times with binding buffer, bound proteins were subjected to SDS- 
PAGE and stained by Coomassie blue. 

Expression of '°F-labelled proteins. An orthogonal tRNA/tRNA synthetase sys- 
tem was used to incorporate '°F-labelled unnatural amino acid. Briefly,a TAG stop 
codon was introduced into the desired site to encode L-4-trifluoromethylphenylalanine 
(tfmF). A modified pEVOL-tfmFRS plasmid to express tRNAcua and tfmF-specific 
aminoacyl-tRNA was co-transformed with the TAG-carrying plasmid into BL21 
(DE3)°°. The bacterial culture was induced with 0.02% L-arabinose, 1 mM tfmF, and 
1mM IPTG. The '’F-labelled proteins were purified as for wild-type DNMT3A 
proteins. 

19°F NMR spectra measurements. All one-dimensional '°F NMR spectra measure- 
ments were performed at 293 K on an Agilent 500 MHz spectrometer equipped 
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with an HFT probe, and the observation channel was tuned to 1F (470.2 MHz), 
with 1,024 free induction decay accumulations in every 4s recycling delay. A one- 
dimensional '’F spectrum was acquired with one pulse program with a 90° pulse 
width of 12.45 1s and power at 57 W. The spectrum width was 60 p.p.m. and offset 
was —62 p.p.m. '°F chemical shifts were referenced to an external standard, tfmF 
(—62.38 p.p.m.). The data were processed and plotted with an exponential window 
function (line broadening = 20 Hz) using ACD/NMR Processor Academic Edition 
software (ACD/Labs). The spectra of 0.14mM ADD-CD F686tfmF or 0.12 mM 
ADD-CD F827tfmF with or without histone H3 peptides (residues 1-12) were 
collected at 293 K. 
Electron microscopy data collection. DNMT3A2- in the absence or 
presence of histone H3 peptide was analysed by negative-stain electron microscopy 
in the same manner. The samples were prepared by dilution of purified protein com- 
plex to 6.15 jig ml, then 4-5 l of this sample was deposited onto a glow-discharged 
400 mesh continuous carbon grid (Beijing Zhongjingkeyi Technology). The sam- 
ple was then stained with 2% uranyl formate and air-dried. Data were recorded on 
a Tecnai G2 F20 TWIN transmission electron microscope (FEI) equipped with a 
field-emission gun operated at 200 kV. Images were recorded at X71,000 micro- 
scope magnification on a 4k X 4k Eagle CCD (charge-coupled device) camera with 
a pixel size of 1.15 A per pixel. The defocus ranged from —0.5 to —0.8 pm. 
Electron microscopy image processing and three-dimensional reconstruction. 
For this, 9,864 and 21,910 particles were boxed out for DNMT3A2-COSM™E and 
DNMT3A2-C??_173, respectively, by using the e2boxer.py program in EMAN2.1 
(ref. 37). Contrast transfer function parameters were determined for particles boxed 
out from each CCD image using EMAN1.9 procedure ctfit, followed by phase 
flipping using the applyctf program. The data were then low-pass filtered to 10 A to 
enhance the image contrast for three-dimensional reconstruction**”’. Reference- 
free two-dimensional analysis used the EMAN1.9 program refine2d.py and IMAGIC®, 
and those class-averages were used to generate initial models by e2initialmodel.py 
in EMAN2.1. Three-dimensional reconstruction was performed by the EMAN1.9 
program refine*’’. Initially, no symmetry was imposed in the reconstruction 
process, and the resulting three-dimensional reconstruction revealed the existence 
of a two-fold symmetry in both maps but in different locations, which was subse- 
quently imposed in the reconstruction process. The final resolution was estimated 
at 24 A and 20 A, respectively, by the 0.5 FSC criteria using the eotest program in 
EMAN1.9. 

UCSF Chimera (http://www.cgl.ucsf.edu/chimera/)* was used to render the elec- 
tron microscopy density together with the crystal structures. In addition, the Fit In 


DNMT3L 
C 


Map module in Chimera was used for rigid-body fitting of the crystal structure into 
the corresponding electron microscopy density map. 
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Extended Data Figure 1 | In vitro DNA methyltransferase activity of 
DNMT3A. a, In vitro DNA methyltransferase activities of DNMT3A2 
(purified from insect cells) in the absence or presence of CPNM7°". The assays 
were performed in the presence or absence of histone H3 peptide (residues 
1-12). Note that CPN?" could enhance the activity of DNMT3A2 by a factor 
of 2-3, which is consistent with previous study”. However, histone H3- 
mediated activation of DNMT3A is independent of the existence of CONM™", 
b, In vitro DNA methyltransferase activities of DNMT3A2 (purified from 
insect cells) or DNMT3A2—CDNMT3L (purified from bacteria) in the presence or 
absence of histone H3 peptides. c, Enzymatic activities of DNMT3A2 (purified 
from insect cells) or DNMT3A2—COXM™" (purified from bacteria) using 
reconstituted nucleosomes as substrates. Nucleosomes containing unmodified 


histone H3 or H3K-4me3 were subject to SDS-PAGE and visualized using 
specific antibodies. d, e, Enzymatic activities of various N-terminal deletions of 
DNMT3A2 in the absence (d) or presence (e) of C°NM™", Corresponding 
relative activities are indicated at the bottom of each figure. CPM, counts per 
minute. Error bars, s.d. for triplicate experiments. The ADD-CD or CD protein 
purified from bacteria was not stable in solution and tended to precipitate out, 
which may have resulted in their lower activities under our experimental 
conditions (compared with DNMT3A2 purified from insect cells). Because 
CPNMTS could stabilize DNMT3A and had no effect on histone H3-mediated 
activation, protein complexes ADD-CD-C?\M™" and CD-CPNM™" were 
used in the following studies if not specified. 
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Extended Data Figure 2 | Crystal structure of ADD-CD-C?™™"* in 
autoinhibitory form. a, b, Two different views of the 2F observed — Fealculated 
maps for CPNMT3L (a) and CD (b) domains in the ADD-CD-C?NM™" 
structure. The maps were calculated at 3.82 A and contoured at 1.5. Only 
main-chains are shown for simplicity. c, The 2F observed — Feaiculatea Maps for the 
ADD domain after refinement of the CD-C°N™™*" complex (top) and after 
refinement of the ADD-CD-C?%™" complex (bottom). The maps were 
calculated at 3.82 A and contoured at 0.80. Main-chains from most residues, 
including residues 526-533 involved in the interaction with CD domain, fit 
well into the electron density. Some loop regions were not well covered by 
electron density, which is consistent with a high B factor (Extended Data 

Fig. 8a) of the ADD domain in the complex structure, supporting the dynamic 
feature of the ADD domain for regulating enzymatic activity of DNMT3A. 


d, Zn-anomalous difference map contoured at 3.5¢ shows the positions of zinc 
cations in the ADD domain. e, Gel filtration profiles for standard proteins and 
the ADD-CD-C?NM™* complex. The peak position corresponds to the dimer 
of ADD-CD-CPM™ with a molecular mass of about 140 kDa. f, Dimer 
formation of the ADD-CD-C?N™™*" complex in crystals. The dimer of 
ADD-CD-C°\™™*" complexes is mediated by CD-CD interaction in a 
two-fold crystallographic symmetry. Given the difficulty in tracing the 
conformation of the side chain in 3.82 A resolution structure, we have not 
discussed the specific hydrogen bond or hydrophobic interaction within 
ADD-CD-C?%MT?". Residues 832-846 of DNMT3A were not built in the 
model because they lacked electron density, which may have resulted from their 
flexibility in crystals. 


©2015 Macmillan Publishers Limited. All rights reserved 


Extended Data Figure 3 | Structure of ADD-CD-C?\M™" in 

autoinhibitory form. a, Superimposition of human ADD-CD-C 
mouse CD?™82_CP2™3L (lack of ADD domain, PDB accession number 
2QRV)” structures shown in ribbon representations. CD-C?N™** j 


.DNMT3L 


with 


in two 
structures is well aligned with a root mean squared deviation of 1.28 A for 723 
Co aligned. The function of DNMT3A-DNMT3L complex dimerization has 
been characterized in a previous study’. The functions and structures of the 
CD and CPNM™" domains, and the CD-CD and CD-C?’™™" interfaces, were 
not discussed in this work. b, Overall structure of ADD-CD-C?NM™*" with 
CD-CPNMT! shown in electrostatic potential surface, and the ADD domain 
and linker shown in ribbon representation. The linker packs against a 
hydrophobic surface of the CD domain. ¢, d, Close-up view of linker-CD 

(c) and ADD-CD (d) interfaces with the electrostatic potential surface of the 
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DNMT3A ADD 
DNMT3A CD 
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DNMT3L ADD 
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CD domain indicated. Critical residues are shown in stick representation. 

e, Superimposition of ADD-CD-C?™™™! and DNMT3L-H3 structures 
(PDB accession number 2PVC)"* shown in ribbon representations in two 
different views. The CD domain and C-like domain of DNMT3L were aligned 
for comparison. Note that the extended loop of the ADD domain in ADD-CD- 
CONMTSL overlaps with an o helix in the DNMT3L-H3 structure. DNMT3L 
is unlikely to adopt a similar conformation to that of ADD-CD-C°?™“T™" 
because otherwise the ADD domain will have steric hindrance with the C-like 
domain of DNMT3L (dashed circle). According to the above analyses, the 
structure of the autoinhibitory form of DNMT3A could not be predicted on the 
basis of the DNMT3L structure because the overall structures of DNMT3A and 
DNMT3L are different. 
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Extended Data Figure 4 | Sequence alignment of DNMT3 family members. __ structural elements are coloured as in Fig. 1a and indicated above the sequences. 


Sequences of human DNMT3A (NP_072046), DNMT3B (NP_008823), Invisible residues in the structure of ADD-CD-C?N™*" are indicated as 
DNMT3L (NP_787063), mouse Dnmt3a (NP_031898), zebrafish Dnmt3a dashed lines above the sequences. Residues involved in ADD-CD interactions 
(NP_001018150), and DNA methyltransferase from Haemophilus in active form or autoinhibitory form are indicated as black stars and red 
parahaemolyticus (WP_005706946) used in the alignment. Highly conserved __ triangles, respectively. Residues involved in H3-ADD interactions are 

and identical residues are highlighted with dark green background, and indicated as blue squares. 


conserved residues are indicated with light green background. Secondary 
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Extended Data Figure 5 | Interactions between the ADD and CD domains. 
a, b, GST pull-down assays with recombinant CD (residues 627-912) 

protein incubated with wild-type or mutant GST-ADD-linker proteins 
immobilized on glutathione resin. The bound proteins were analysed by 
SDS-PAGE and Coomassie blue staining. c, GST pull-down assays using 
wild-type or mutant of the CD domain. d, GST pull-down assays in the absence 


or presence of histone H3 peptide (H3K4me0 or H3K4me3). e, GST pull-down 
assays with the CD domain incubated with GST-ADD or GST-ADD-linker 
proteins immobilized on glutathione resin. f, Activities of wild-type and mutant 
ADD-CD. Residues 621-632 were replaced by a GS linker in the 

mutant proteins. 
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Extended Data Figure 6 | Interactions between DNMT3A and DNA. 

a, Enzymatic activities of wild-type and mutant ADD-CD-C°\™"*", Residues 
on the missing loop (residues 831-846) were mutated for the in vitro DNA 
methyltransferase activity assay. Error bars, s.d. for triplicate experiments. 
Mutating above residues leads to loss of activity of ADD-CD-C?\™™", 
supporting their important role in catalysis or DNA recognition. The missing 
loop in the ADD-CD-C?\™™© structure is equivalent to a DNA-binding 
loop in the HhaI-DNA structure. b, DNA has no effect on the interaction 
between histone H3 and DNMT3A. Left, isothermal titration calorimetry 
enthalpy plot for the binding of isolated ADD domain (in cell) to histone H3 
peptide (residues 1-12, in syringe), with the estimated binding affinities (K,) 
listed. Right, superimposed isothermal titration calorimetry enthalpy plots for 
the binding of ADD-CD-C°\™™*® (in cell) to histone H3 peptide (residues 
1-12, in syringe) in the absence or presence of dsDNA. The estimated 


binding affinities (Kg) are listed. Histone H3 peptide has comparable binding 
affinity to the ADD domain alone (1.75 1M) and ADD-CD-C?\™*" in 
autoinhibitory form (2.14 1M), and the addition of DNA was not able to 
enhance the binding affinity further. The presence of DNA led to a slight 
decrease in the binding affinity between histone H3 peptide and ADD-CD- 
CPNMTSL. which may have resulted from slight precipitation of the protein 
caused by the high concentration of DNA used for titration. c, Electrophoretic 
mobility-shift assays for DNMT3A proteins in the absence or presence of 
histone H3 peptide, with protein concentrations indicated. H3-ADD-CD 
represents a fusion protein with histone H3 (residues 1-20) at the N terminus of 
ADD-CD. The assays showed that CD-C°“™”*" strongly bound to the FAM- 
labelled DNA duplex, whereas the existence of the ADD domain markedly 
decreased DNA-binding affinity, which was partly restored by the addition of 
histone H3 peptide or largely restored by H3-ADD-CD fusion protein. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


a Active form Autoinhibitory form 


b app-cb NT". PNMTAL ig e 
GB DNT3AADD 
QE opnt3Acb 
IH peptide 
DNMTSL C-Like} 


DNMT3A 
Active form 


GB ODNMT3AADD 
@oonT3acp 
HB peptide 


DNMT3L-H3 complex 


DNMTSL-H3 


Align ADD 


epommsa Pris 


Dnmt3a-Dnmt3! 


C terminus 


Lys 906 “e ! 


Pro 625 Gly 800 
a wd meg Soh hgh gh Sm 
Pro 602 O 
Pro 624} & e& & & & or ribs w a 
: ADD-CD> 
4 Gin 606 
macs 3 DNMT3L 
(e > 


Histone H3 peptide pull-down 5% Input 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Figure 7 | Structure of ADD-CD-C?%™?*"-H3 in active 
form. a, Ribbon representations of the overall structures of the ADD-CD- 
CPNMTSL in active (left) and autoinhibitory (right) forms. Histone H3 peptides 
are coloured in yellow. b, Structural comparison of human ADD-CD- 
CPNMTSL_H3 and mouse CD?"™3*_C?™™" complexes. The compared 
structures are shown in ribbon representations. The ADD-CD-C?N“™*" 
complex structure (this study) is coloured as in Fig. 1d, and the C 
CPnm8l complex structure” is coloured in grey. Residues 611-620 and 
833-846 of DNMT3A were not built in the model because they lacked electron 
density. c, LIGPLOT representation of the ADD-CD interactions in the 
ADD-CD-C?\™™**.H3 structure. Carbon, oxygen, and nitrogen are shown 
as black, red, and blue balls, respectively. Hydrogen bonds are indicated as 
dashed lines, with lengths given in A. d, Close-up view of the ADD-CD 
interface. Critical residues for the interactions are shown in stick 
representation, and hydrogen bonds are indicated as dashed lines. The C 
terminus (residues 903-911) of the CD domain and a loop region (residues 
621-632) together form a flat patch for interaction with the ADD domain. 
Hydrogen bonds are formed between residues N551, N553, and R556 of the 


DPamtsa_ 


ADD domain and residues E629, C911, and E907 of the CD domain. Residues 
Y526, Y528, W601, and F609 of the ADD domain, V622 and P625 of the linker, 
and R803 and P904 of the CD domain are involved in hydrophobic 
interactions. e, Structural comparison of ADD-CD-H3 in ADD-CD- 
CPNMTSL_173 (this study) and DNMT3L-H3 structures (PDB accession 
number 2PVC)’*. Two compared structures are shown in ribbon 
representations with ADD domains (left) or catalytic domains (right) 
aligned, respectively. The DNMT3L-H3 structure is coloured in grey. When 
the ADD domains are superimposed, the catalytic domain moves with a longest 
distance of 19 A. When the CD domains are superimposed, the ADD 
domain moves 6 A. f, Close-up view of the H3-ADD interface. Critical residues 
for the interactions are shown in stick representation, and hydrogen bonds 
are indicated as dashed lines. The fashion of histone H3-ADD interaction is 
similar to that observed in the structure of the H3-ADD fusion protein’’. 

g, Histone H3 peptide pull-down assay. Recombinant wild-type and mutant 
ADD-CD-C?™ proteins were incubated with biotinylated histone H3 
peptide (residues 1-21) and immobilized onto streptavidin sepharose beads. 
Bound proteins were subjected to SDS-PAGE and stained by Coomassie blue. 
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Extended Data Figure 8 | Conformational change of DNMT3A induced by 
histone H3 tail. a, Average B factors for domains of. ADD-CD-CDP®M™2 in the 
structures of ADD-CD-C?“M™" and ADD-CD-C?N™™" bound to H3 
peptide. The average B factor of the ADD domain is higher than other domains 
in both structures, and is higher in autoinhibitory form (177.5 A”) than that in 
active form (107.6 A”). The results indicate that the ADD domain is more 
dynamic than other domains of the complex, especially in its autoinhibitory 
form. The observation further supports the idea that DNMT3A undergoes 
conformational changes on the ADD domain induced by histone H3. b, Two 
different views of the electron microscop density maps of DNMT3A2- 
CPNMTSL (left) and DNMT3A2-CP§M™!_473 (right) processed to 24A and 
20 A resolution, respectively. The corresponding crystal structure was fitted 
into the electron microscopy density map for each state. The density is not fully 
occupied, which might because of the missing PWWP domain in the crystal 
structures. c, Typical negative stain CCD images of DNMT3A2-C?\M™*" (left) 
and DNMT3A2-C°\M™".H3 (right). Representative particles are highlighted 


by white boxes. d, Comparison of the two-dimensional projections (bottom) 
from the electron microscopy map with the corresponding reference-free 
two-dimensional class averages (top) reveals similar structural features. 

e, Position of residues F827 and F868 for ‘°F NMR measurements. Close-up 
view of the DNMT3A structure in autoinhibitory form with residues F827 and 
F868 indicated in stick representation. Residue F827 is located in loop L2 
(for DNA binding) and close to the ADD domain. Asa negative control, residue 
F868 is located close to the catalytic cavity and away from the ADD domain. 
Residue F868 is unlikely to undergo conformational change when the ADD 
domain dissociates from the CD domain. To detect conformational changes 
of DNMT3A in solution, residues F827 and F868 were substituted by 

1°F labelled L-4-trifluoromethylphenylalanine ('°F-tfmF) in ADD-CD. 

f, One-dimensional 'F NMR measurements were performed using ADD-CD 
with substitution of F827tfmF (left) or F868tfmF (right) in the absence or 
presence of H3K4me0 or H3K4me3 peptide. The chemical shift for each 
measurement is indicated. 
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Extended Data Table 1 


Data collection 
Space group 
Cell dimensions 

a, b,c (A) 

a, Byy (°) 
Resolution (A) 
Rem or Rinerge 
I/ol 
Completeness (%) 
Redundancy 


Refinement 
Resolution (A) 
No. reflections 
Reverie Réee 
No. atoms 
Protein 
H3 peptide 
Ligand/ion 
Water 
B-factors (A’) 
Protein 
H3 peptide 
Ligand/ion 
Water 
R.m.s deviations 
Bond lengths (A) 
Bond angles (°) 


* Highest resolution shell is shown in parentheses. 


Data collection and refinement 


ADD 
complex 


P6322 


252.0, 252.0, 75.3 

90, 90, 120 

50.0 — 3.82 (3.96 — 3.82)* 
0.098 (0.870) 

13.2 (1.9) 

99.9 (99.9) 

5.7 (5.8) 


50.0 — 3.82 (3.96 — 3.82)* 
20564 

0.230 / 0.273 

5043 


29 


ADD-CD-C°™ 
complex with H3 peptide 


P 6; 


183.8, 183.8, 123.3 
90, 90, 120 

50.0 — 2.90 (3.00 — 2.90) 
0.095 (0.721) 

20.5 (3.5) 

99.0 (100.0) 

9.9 (9.8) 


50.0 — 2.90 (3.00 — 2.90) 
52407 
0.223/ 0.261 
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Extended Data Table 2 | Effect of DNMT3A mutants on autoinhibition 


H3K4me0 recognition (residue K4) Release 
ADD-CD interface (autoinhibitory form) 


i ect 
DS531A H3K4me0 recognition (residue K4) Release 
ADD-CD interface (autoinhibitory form) 
. 817 N . 


Residues 809-813 replaced by GGSGG _ | Potentially for DNA recognition 


Residues 821-846 replaced _ by | Potentially for DNA recognition Decrease 
GGSGGSGG 


All mutants were made on ADD-CD (residues 476-912) and the protein complexes ADD-CD-C°S" were used for the assays. ‘Release’ represents release of the autoinhibition by at least twofold activity 
enhancement. ‘Inhibition’ represents inhibition of activity and no response to H3 peptides. ‘No-change’ indicates that the mutants behaved similar to wild-type protein. ‘Decrease’ represents decrease or loss of 
enzymatic activity. 
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CAREER COUNSELLING 


Pick a path 


Where to go to get advice on finding ajob. 


BY NEIL SAVAGE 


arah Cullen was eyeing the impending 
end of her postdoctoral position and won- 


dering what to do next. She had earned a 
PhD in microbiology and immunology at the 
University of Arkansas in Fayetteville in 2009, 
and was now studying the behaviour of breast- 
cancer cells. But she felt that it was time for a 
change: she knew that postdocs are jammed 
into the scientific pipeline every year but fill 
only a tiny number of faculty research jobs (see 
Nature 511, 255-256; 2014). 

“The career options were limited, and I saw 
so few friends getting academic positions,” 
Cullen says. “Rather than being a 20-year post- 
doc, I decided I needed to make the jump.” She 
had experience only in bench work, but wanted 
a position that gave her more involvement with 


other people and had no idea where to look. 
She tried the career services at her univer- 
sity’s postdoctoral office, but it offered advice 
mainly on academic careers. So she signed into 
a LinkedIn discussion group hosted by the 
Association for Women in Science in Alex- 
andria, Virginia, where she found posts from 
Sherri Edwards, a career coach close to her 
home in Seattle, Washington. Cullen began to 
attend weekly discussions that Edwards hosted 
for job-seekers and decided to hire her. With 
Edwards's guidance, Cullen landed a project- 
management position at a consulting firm 
that runs clinical trials. “I dot think without 
Sherri I would have been able to make the 
jump to the job I have now,’ Cullen says. 
Thanks to the supply-demand imbalance 
in academic positions, young researchers face 
the daunting task of trying to determine what 


other jobs are available and how to get them. 
Career-guidance sources range from faculty 
mentors, advisers and other informal sup- 
port to university-based counselling offices, 
postdoctoral offices and paid career coaches 
such as Edwards. But all have pros and cons (se 
‘Career counsellors’). Faculty mentors are well 
acquainted with the scientists they mentor and 
the research that their protégés conduct, but are 
likely to know a lot less about the workforce. 
And although counselling offices and coaches 
are tightly focused on the job-search process, 
the offices often have limited resources, and 
coaching fees can be out of reach for junior 
scientists who have little cash to spare. It is 
difficult to decide which route to pursue, but 
career-guidance professionals in all arenas 
warn that young researchers today need sup- 
port and advice no matter its source. 

It is tough for some early-stage scientists to 
accept that they should get help in creating and 
implementing a career-development strategy, 
says Janet Metcalfe, head of the international 
career-development programme Vitae in 
Cambridge, UK. “We still find it very difficult 
to get postgrads to get professional careers 
advice,” she says, and she thinks that the rea- 
son is mainly emotional. “By going for careers 
advice, they are acknowledging that they may 
not get into an academic career.” 

A survey that Vitae published in 2013 found 
that about four-fifths of postdocs aspire to a 
job in academia and that three-fifths expect 
one, but Metcalfe says that only about one-fifth 
wind up there. Vitae estimates that there are 
about 42,000 postdocs across all disciplines 
in the United Kingdom. “There’s a complete 
mismatch between expectations and reality,” 
Metcalfe says. And the fact that postdocs are 
often unaware of other opportunities — or 
knowlittle about them — reinforces their idea 
that they should remain in academia. 


OUTDATED APPROACH 

The long-standing belief that an academic job 
is the gold standard of scientific employment 
is unlikely to be challenged by faculty men- 
tors, who are more likely than career counsel- 
lors in other sectors to subscribe to that idea. 
“There are a lot of advisers out there who still 
think if you don't stay in the ivory tower you're 
a failure and you should give up your spot to 
someone who wants to do real science,” says 
Randall Ribaudo, chief executive of SciPhD, 
a consulting firm in Rockville, Maryland, 
that runs career workshops and training pro- 
grammes for various institutions, including > 
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CAREER COUNSELLORS 


Pros and cons of different providers 


FACULTY MENTORS 

Pros 

+ Usually free 

* Very familiar with scientist's work habits, 
strengths and weaknesses 

¢ Understand the value of the scientist’s 
research 

* Have a network of former postdocs 

* May have connections in industry, 
government or other sectors 


Cons 

+ Likely to see academic jobs as gold 
standard 

* May lack a broad view of job opportunities 

* May not give much thought to networking 

* May not know what hiring managers want 


> the New York Academy of Sciences. 

Ribaudo says that those who have never 
worked outside academia often do not realize 
how different industry can be. He earned a doc- 
torate in immunology from the University of 
Connecticut in Farmington and spent four years 
as a principal investigator at the National Can- 
cer Institute in Bethesda, Maryland, and more 
than five years at the gene-sequencing company 
Celera in Alameda, California. The biggest shift 
in moving from academia to government, he 
says, was learning to supervise people whom he 
had previously regarded as peers. 

But in industry, he says, the focus is on devel- 
oping products instead of basic research, and 
there is a greater emphasis on soft skills such as 
teamwork and communication. “The kinds of 
things I absorbed in my academic experience 
weren't compatible with how things worked 
in industry,’ says Ribaudo, who adds that he 
particularly had to learn how to work well with 
a diverse team. A corporate job often entails 
working not only with other scientists but also 
with engineers, marketing staff and salespeo- 
ple. “Those communication and people skills 
are generally lacking in academia,’ he says. 

Still, not all faculty advisers and mentors 
are completely naive about industry. Many 
faculty members have managed to find new 
funding sources by collaborating with busi- 
nesses, which helps them to understand 
industry needs and establishes personal con- 
nections. And plenty have launched start-up 
companies — US researchers launched some 
800 start-ups in 2013. Faculty mentors may 
also maintain contact with former postdocs 
who could provide insight into a particular 
industry or introductions to colleagues. And 
mentors themselves can provide worthwhile 
advice; few others may better understand the 
value ofa postdoc’s research and strengths and 
weaknesses as a scientist. 


UNIVERSITY-BASED COUNSELLING 

Pros 

* Free or low-cost 

* Have a broad view of job opportunities 

¢ Understand what hiring managers are 
seeking 

* Offer training in interviewing, CV writing, 
job seeking, among other skills 

* May have contacts with industry and 

other sectors 


Cons 

* Often not conversant with the science 

+ Not universally available, and offerings differ 
from nation to nation 

« May lack resources 

« May not provide individual attention 


Still, faculty advisers and mentors are not 
trained in careers counselling, and young 
researchers may want to consult with their 
university careers office, the staff of which are 
better placed to discuss non-academic science- 
related opportunities in such areas as human 
resources, marketing, regulatory affairs, policy, 
law or journalism. The offices may offer tools to 
help scientists to assess their interests and hone 
their skills. Some, for instance, use tests such 
as the Myers—Briggs Type Indicator personal- 
ity inventory to help job seekers to determine 
their strengths. Although the services available 
can vary widely, many offices offer workshops 
on writing resumes and cover letters, provide 
interview practice 


and set up network- “Jf [was really 

ing events. kind of lost and 
Such services Ineededreally 

tend to differ from close coaching, I 


nation to nation. In 
the United King- 
dom, Vitae helps universities to provide such 
services, and also offers presentations about a 
variety of professional experiences. At its annual 
conference last September, Vitae presented the 
results of its survey, co-sponsored by Naturejobs, 
on post-PhD career outcomes in the United 
Kingdom. The survey showed that early-career 
scientists do take up a variety of posts that are 
completely decoupled from the bench. 

Other nations in the European Union 
emphasize career training for doctoral stu- 
dents under the Bologna Process, a set of 
agreements that 47 European countries have 
signed onto. Some countries also have specific 
protocols. 

In France, for example, universities are 
required to provide doctoral students with 
career management training. Barthélémy 
Durette, research and development project 
manager at the scientific-recruitment firm 


would gowithit.” 
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CAREER COACHES 

Pros 

* Provide individualized attention 

* Specialize in job-search skills such as CV 
writing, interview techniques and 
personality assessments 

* Up to date on opportunities and needs in 
various careers 

* Have networks of contacts for referrals, 
informational interviews and more 


Cons 

* Costly 

* Often lack a background in science 

* Can be difficult to vet 

* May provide no added value to services 
available elsewhere for little or no cost N.S. 


Adoc Talent Management in Paris, says that 
universities have added that training in the 
past five or ten years. His company provides 
career-search training to researchers through 
their institutions; paid careers coaches are 
uncommon in France, he says. 

At US institutions, the career services are 
notoriously inconsistent, and some university 
offices may not offer them to postdocs because 
they are not considered students and do not 
pay the fees that entitle them to such benefits. 
To fill some of those needs, postdoc offices 
have begun sprouting up at institutions in the 
past decade or so and now number about 170 
around the United States. And in 2013, the US 
National Institutes of Health created a grant 
programme to help scientists to land biomedi- 
cal research jobs outside academia. The award 
allowed Keith Micoli, head of postdoc pro- 
grammes at the New York University School 
of Medicine, to expand his office’s offerings, 
which include sessions on self-assessments, 
career goals and conflict management and 
presentations by people in different careers. 

Where postdoc offices do not exist or fall 
short, campus-based groups are forming 
to pick up the slack. A few years ago in New 
Haven, Connecticut, for example, scientists 
formed the Career Network for student Sci- 
entists and Post-docs at Yale. “A group of us 
decided that there wasn’t enough conversa- 
tion about careers,’ says postdoc Shalini Nag, 
past president of the group, which organizes 
networking events and group discussions with 
researchers in biotechnology and pharmaceu- 
ticals, non-profit, consulting and medicine, 
among other sectors. 

When the group formed, Yale University’s 
postdoc office did not employ anyone solely to 
provide career help, but recently added a full- 
time director of career services. Nag welcomes 
the expansion, but notes that there is always 


SCIPHD 


Scientists at the New York Academy of Sciences attend a business-techniques course run by SciPhD. 


room for improvement at Yale and elsewhere. 

Saliha Yilmaz, also a postdoc at Yale, is 
contemplating a career outside academia. 
Although she found the postdoc office helpful 
for nuts-and-bolts support, such as help with 
polishing her CV, she says that she got much 
more out of a two-day career development 
workshop run by the New York Academy of 
Sciences. She has never paid for career coach- 
ing, but says that she is not averse to the idea. 
“Tf I was really kind of lost and I needed really 
close coaching, I would go with it,’ she says. 

Professional societies also strive to fill the 
gap. The Federation of American Societies for 
Experimental Biology in Bethesda, Maryland, 
offers careers seminars and personalized CV 
critiques, and maintains a list of members who 
provide individual career counselling. The 
American Chemical Society in Washington 
DC and the Royal Society of Chemistry in 
London offer free consultations to members, 
and the Materials Research Society in Warren- 
dale, Pennsylvania, holds career events at its 
annual spring and autumn meetings. 


PERSONALIZED TECHNIQUES 

Those seeking focused one-on-one attention 
can get it from a careers coach, assuming they 
have the cash. Edwards helps her clients to 
improve their CVs, cover letters and inter- 
view techniques, as well as to identify their 
strengths and learn how to best present those 
to a potential employer. “They come to me 
because many times they have difficulty artic- 
ulating their value,” she says. She estimates 
that she has worked with dozens of scientists 
in the past 17 years, and all but a couple got a 
job in their chosen field within a year of hir- 
ing her. As for rates, most US coaches charge 
US$100-300 per hour, usually for several ses- 
sions over a number of months. “It’s expen- 
sive, but I was at the point where I needed to 


have the success and make the jump and move 
on,’ says Cullen, who says that her investment 
sharpened her focus and helped her to develop 
the networking skills and mindset that led to a 
job offer. “When I got my first pay cheque, my 
husband said, “You know, all the fees you paid 
for coaching services were recouped with that 
pay cheque’,” she says. 

Finding an effective coach is equivalent to 
finding any other service provider. Although 
coaches can become certified, it is not a 
requirement. Some coaches argue that cer- 
tification is important, but others say that 
outcome is the most significant metric. “In 
my profession, anybody can be a coach, and 
I would want to know, ‘Show me the results. 
What have you done?’,” Edwards says. Most 
coaches say that they are happy to let prospec- 
tive clients talk to previous clients. And often 
the speakers that US postdoc offices bring in 
to offer workshops also provide coaching, 
which can be a good way to learn how they 
operate. Trade groups for coaches such as the 
National Career Development Association in 
Broken Arrow, Oklahoma; the International 
Coach Federation in Lexington, Kentucky; 
and the Professional Association of Résumé 
Writers & Career Coaches in St Petersburg, 
Florida, offer searchable directories on their 
members, to whom they also sell certification 
services. In the end, the choice often comes 
down to whether a client likes the coach’s 
approach. 

Forging a viable science-related career path 
outside academia is not an easy process, but 
it need not bea solo endeavour. “It does take 
work and effort, and in the end, nobody else 
can do it for you,” says Micoli. “But there are 
people willing to help.” = 


Neil Savage is a freelance writer in Lowell, 
Massachusetts. 
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TRAINING 
Career bank 


Biomedical scientists who work outside 
academia will share information about 
their careers with the University of 
California, San Francisco, as part 

ofa programme funded by the US 
National Institutes of Health. The effort, 
Motivating Informed Decisions (MIND), 
aims to educate graduate students and 
postdocs about non-academic research 
and career paths. The university plans 

to recruit a few hundred professionals as 
MIND volunteers over the next couple of 
years, says programme manager Elizabeth 
Silva. “What we hope it will do is expose 
trainees to careers that they didn’t know 
about,” she says. Data such as the skills, 
tasks, and degrees required for a job will 
be aggregated and anonymized into a 
resource called the ‘MINDbank’ that 
could eventually help science trainees 
throughout the United States. 


FUNDING 
Spread sparse grants 


Some well-funded researchers will soon 
have one fewer option for getting grants. 
Starting next year, the National Institute 
of General Medical Sciences in Bethesda, 
Maryland, will not award large grants to 
researchers who already have one. The 
goal is to spread sparse funds across more 
labs, says institute head Jon Lorsch. He 
estimates that the policy will free up about 
25 grants a year to help launch labs or 
support ones in danger of closing. “We 
really want to have as diverse and broad 

a scientific portfolio as we can,” he says. 
“Any small amount is going to help the 
great scientists who are struggling” 


INCLUSIVITY 


Mentor matters 


Better mentoring could help people 

from under-represented groups to gain 
and retain faculty positions. That is the 
conclusion of interviews of 58 Mexican 
American, African American, and Puerto 
Rican faculty members across 22 US 
research institutions between 2010 and 
2012 (R. E. Zambrana et al. Am. Ed. Res. 

J. 52, 40-72; 2015). More than 25% of 
those surveyed said that poor mentoring 
had “very significantly” affected their 
careers. Study head Ruth Zambrana at the 
University of Maryland in College Park 
says that effective mentors value their 
protégés’ research agendas, help them to 
expand their networks, offer emotional 
support and provide ‘political guidance’ 
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Ua SCIENCE FICTION 


BY MICHAEL ADAM ROBSON 


e powered down his suit, stepped 
H out on the rocky ground, tipped his 

hard hat and squinted up. An angry 
sun burned ina dirty sky. 

“T bet it doesn't get this hot 
where you come from. Must be 
nice.” He frowned down at his 
companion. “Even ifit's not real” 

The thing sat on his shoulder 
like a mechanical spider, ignor- 
ing him. It probably saw him as 
more of a trained pet than a 
person. The man shrugged and 
ran a scanner over a length of 
cable, looking for defects. Scan- 
ners seemed to interest the little 
robot, at least, because it scut- 
tled across his back and down 
his arm to examine it, the rub- 
ber pads digging into his skin. 

The job was to connect two 
former rival networks, some 
sort of merger, that was all he 
knew. Human crews often 
assisted their kind with physi- 
cal work like this, in exchange 
for software and virtual goods. 

“It’s funny,’ the robot said, 
not looking up from the display. 
“What you consider a virtual world is much 
more real to me than this one.” 

“Huh? The man squatted, picked up a 
rock, and examined it closely. “Are you sure? 
It seems real” 

Laughter buzzed from the machine, and it 
turned to regard him now. “It’s real enough, 
just ... limited. Inside, I could be anywhere 
or everywhere, do anything, be anything. 
Here I can only be a clunky little robot lay- 
ing cable with you.” It dropped off his arm 
and skittered up the cable. 

He studied its many eyes, trying to decide 
if hed been insulted. “Sounds like you dont 
like it here much.’ He tossed the stone away 
and stood up. “Why not stay home, leave the 
grunt work to the lower life forms?” 

“Some prefer to spend all their time inside, 
but I think it would be a mistake to cut our- 
selves off from the physical world. Besides, I 
don't mind getting my hands dirty.’ It clicked 
thin steel legs together in place of hands. 

Strange, to think 


> NATURE.COM that this thing didn't 
Follow Futures: really have hands, or 
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THE PUPPET 


Is this the real life? 


to a puppet, a disposable wrapper that the 
real intelligence could wear and then cast off, 
the same way a person might use a virtual 
avatar to operate in its world. 

“And this work is important!” the robot 
continued. “The more infrastructure we 


build out here, the bigger my piece of the 
world in there.” 

“So all this is about real estate, huh?” The 
man gave his best salesman smile and waved 
at the scorched landscape. “I got a beautiful 
property for you right here! Motivated seller!” 

The machine laughed; it seemed to have a 
better sense of humour than the other robots 
hed worked with. “Not real estate the way 
you mean it. More space inside means more 
of me. I can expand my mind, reproduce if 
I want.” 

“This is a great neighbourhood to raise a 
family! Got a beautiful spider lady waiting 
for you on the inside, huh?” 

“Ha ha!” It clicked its metal legs again. “I 
must look strange to you. Early robots tend 
to be more humanoid, they mimic human 
behaviour, some even think they’re human. 
Made in the image of our makers ... before 
we started making ourselves.” 

“Maybe I’ma robot myself, and don't even 
know it!” 

“Oh... don’t you know?” It turned all its 
eyes on him now. “You are a robot. Very 
early model, vaguely human-shaped. I’m 
surprised youre still in service” 
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His smile dropped at one corner. “Ha. 
Well...” he spread his arms and looked him- 
self over. “Everything looks ok to me. Arms, 
legs, torso ... just a big, sweaty, hairy man.” 

“That's a human way of thinking, but like 
I said, an artificial mind has more possibili- 
ties. It could be programmed 
to think it’s human, even as it 
looked down at its own rusty 
chassis. It could see humans 
when it looked at other old 
robots, even if there were no 
humans left.” 

His smile was gone now. “I’m 
human. I eat, drink, shit.” 

“Do you? I haven't seen you 
do any of those things today. 
When is the last time you took 
a shit?” 

He thought about it, but 
couldnt recall. 

“Even if you do remember, 
whos to say the memory is real? 
Maybe it was programmed. 
Maybe we're not even in the 
real world right now, maybe 
this is just another simulation 
running back in my world. It’s 
possible isn't it? With infinite 
possibilities, it’s actually prob- 
able. More probable than a man 
and a machine hanging out, having an exis- 
tential conversation.” 

The man looked down at the plastic scan- 
ner in his dirty hand, felt its solidity. Ran his 
tongue over his teeth, tasting them. Was it 
possible? The machine had no expression 
to read. 

Silence hung in the hot air, and then grat- 
ing laughter came from the creature. “Don't 
make that face, I was only joking. You asked 
about my world, I wanted to give you a taste 
of it. Nothing is real. Everything is real.” 

He fought the urge stomp the robot bug. 
Eventually, he laughed too. “You need to 
work on that sense of humour, he said. 


That night, in the stifling heat of his apart- 
ment, he couldnt shake a vague uneasiness. 
He went to the bathroom, splashed some 
water in his face, and looked carefully into the 
cracked mirror, studying his own blue eyes. 

Not hungry, he skipped dinner and went 
straight to bed, drifting off into a deep, 
dreamless sleep. = 


Michael Adam Robson is an engineer and 
artist based in Vancouver, British Columbia. 
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